ABIS Infor - 2011-11

Object-Oriented Perl

Peter Vanroose (ABIS) - 16 October 2011

Abstract

The programming language Perl allows you to either use procedural logic only, or work in an object-oriented way. In this short technical note I'll try to sketch why one would choose for "Object-Oriented Perl" or not, and give an example of an OO program in Perl.

What is Perl?

Perl is a programming language (as is Java or COBOL or C++), although it's often referred to as a "scripting language". Originated in the unix world in the late 80s, it became popular in the 90s mainly as a tool for text manipulation (e.g., parsing log files and reporting back error messages) and as a web page (CGI) programming tool (as the predecessor of PHP). Currently Perl is very popular as a system administration support tool in a mixed unix / Windows / Mac world since Perl scripts are platform independent. Its main strength lies in its flexibility, its power, and its performance, and especially the possibility to write compact and still very readable programs for performing fairly complex tasks.

Syntactically Perl borrowed from both unix (Bourne) shell script syntax and the C language, which explains the apparent similarity of a Perl program to e.g. Java: it uses the same brace-delimited blocks and control structures (if...else, while, for, ...), variable assignment, and expressions. As opposed to Java, all Perl variable names start with a "sigil": a dollar sign for scalars, an "at" sign for lists, and a percent sign for hashes. Hence variables don't need to be declared!

The following would be a typical Perl program (amongst many other variants) for printing out all 366 dates of 2012:

@monthnam=('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');
@days =   ( 31,   29,   31,   30,   31,   30,   31,   31,   30,   31,   30,   31);
@weekday = ('Sun','Mon','Tue','Wed','Thu','Fri','Sat');
for ($m=0; $m<12; ++$m) {
  for ($d=1; $d<=$days[$m]; ++$d,++$w) {
    print "$weekday[$w%=7] $d $monthnam[$m] 2012\n"; } }

OO-Perl: a little bit of history...

Let's first review the origins of objects in Perl. Already since its early days, Perl has had a very large collection of built-in functions, e.g. including all (unix) system calls. Nevertheless, lots of useful functionality, including e.g. a relational database interface (ODBC) or support for a graphical interface, was (and is) never built-in. On the other hand, Perl programmers (that is, people like you and me) have been writing useful functionality in Perl, first for their own use, but they later made their efforts available to everyone on the internet, in the great Open Source spirit which is also Perl's spirit.

The most important obstacle --when trying to use any of those generously distributed Perl programs-- is often their integration into your own program. Even when written as a Perl subroutine, one should be careful with global variables and other side-effects. In short: the Perl language was not designed for reusability of someone else's program fragments.

Since version 5 (the late 90s), Perl introduced two important new aspects which greatly helped writing (and using) reusable program fragments: Modules and references. At the same time, a website was set up: www.cpan.org, the "Comprehensive Perl Archive Network", where the above-mentioned Perl programs for general use are now being collected. That time period was also the flourishing time of Object-Orientation, with e.g. C++ and Java gaining popularity. Perl 5 took advantage of the OO insights from that time, which meant an important step forward for the reusability of Perl code fragments. But this also meant an important paradigm switch in the syntax of Perl.

Perl Modules and namespaces

At first sight, a "Perl module" is just an ordinary Perl program file, but with a few particular conventions. First of all, the file name ends in ".pm" instead of the otherwise typical ".pl". More importantly, when looking inside such a file, one should see a declaration of the form "package XXX::YYY" which is Perl for "this file belongs to namespace XXX::YYY". Namespaces hide otherwise global variables and subroutines (a.k.a. functions) by limiting their visibility to the mentioned namespace. Most often, the namespace name will be the module name, hence only the module itself (i.e., the particular .pm file) will see those variables and functions, unless you prefix their names with "XXX::YYY::".

Since a single Perl program --including all its dependencies-- must be compiled by the runtime environment, all dependencies must reside (at least virtually) in the single Perl file that is your main program. To avoid physical cut-and-paste, Perl 5 added the "use" statement; issuing a "use XXX::YYY" in your Perl program is like a virtual cut-and-paste: you instruct the Perl compiler to include the file XXX/YYY.pm (which should be a module) into your source file before doing anything else. Which means that at that point your program has access to all namespace-scoped variables and (most importantly) functions of that module. But without them polluting your global namespace, of course.

By the way, the global namespace is called "main", so e.g. your global variable $xyz is actually called "$main::xyz". Which explains some weird error messages that you might have seen already, containing this main:: gibberish...

A simple example of a module

Let's try to construct a Perl module version of the earlier example with dates: we'll create a function (i.e., a Perl subroutine) that returns the list of dates; the function will of course be placed in its own namespace, and the implementation will be placed in a file called "dates.pm" which would e.g. have the following content:

package dates;
@M=('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');
@D=( 31,   29,   31,   30,   31,   30,   31,   31,   30,   31,   30,   31);
@W=('Sun','Mon','Tue','Wed','Thu','Fri','Sat');
# The above three variables are in namespace "dates", not in the global namespace.
sub datelist { for ($m=0; $m<12; ++$m) {
                 for ($d=1; $d<=$D[$m]; ++$d,++$w) {
                   push @r, "$W[$w%=7] $d $M[$m] 2012"; } }
               return @r; }

The "main" program now reduces to something like:

use dates;    print join "\n", dates::datelist();

This is still not an object oriented program: we are just calling a (name-scoped) function; no big deal.

Objects in Perl

Objects are variables that instantiate a so-called class, which is an encapsulation of a few (private) variables to keep state information, and a few (public) functions, called methods, to be run by the user of the class, thereby automatically passing (or actually: attaching) that object (and hence its state) as the first argument of that method.

In Perl, a class is just a namespace. In order to allow a single scalar variable to behave as an object, Perl 5 added the concept of a reference variable: syntactically scalar, but actually a pointer to a bunch of (structured) data, e.g. all private state info stored inside a namespace.

This opens an important new possibility in our "dates" example: since an object can keep state, and only a method can use that state to do something useful (like changing the state and also returning a value), our module alias package alias namespace alias class will be able to return a variable of type "reference" (i.e., a "dates" object) with initial state "1 January 2012" and with the ability to call a method "next_date" that, when repeatedly called, will allow the user to build the same list of dates as the dates::datelist function.

More specifically, the "main" program for printing all dates of 2012 could then be rewritten as

use dates; $date_obj = dates::new(); while ($dt = $date_obj->next()) { print "$dt\n"; }

Observe the "->" notation: this makes sure that the method "next()" is called with the object "$date_obj" as the class instance to be used. This notation is borrowed from C++; in Java, one would have to write a dot "." for this purpose.

Of course, we have not yet looked at how to implement the new() and next() methods of namespace "dates". Here is a possible way to do this; the following lines would have to be added to dates.pm:

sub new { bless $state=[0,1,0];  return $state; }
sub next { $state = shift; ($w,$d,$m)=@$state; if ($d<1 || $m>=12) { return undef; }
           $r="$W[$w] $d $M[$m] 2012"; ++$w; $w%=7; if (++$d > $D[$m]) { $d=1;++$m; }
           @$state = ($w,$d,$m); return $r; }

Note the "bless" declaration for the variable $state: as opposed to @M, @W, and @D, there should be a version of the $state variable per instance of the dates class. By "bless"ing this variable inside the new() method, a new, private state will be created at every call of new().There should normally be exactly one such variable per class (cf. the "this" pointer in C++), and it will be passed back as the first parameter of every method. Typically, every Perl module will have a new() method which just returns a fresh copy of the state variable; otherwise said, new() is the constructor.

The little example given above is just the start of a typical class design (in Perl, but just the same in e.g. Java): private data members (here: day number, month number, and weekday number) store the state, but the public interface (here: methods new() and next()) can be used without having to know those internals. Important advantages of such an object-oriented approach are: (1) choice of state variables and implementation of methods can be freely modified (inside dates.pm) without affecting any user of the dates module, as long as the methods keep functioning as documented; and (2) new documented methods can be easily added; their implementation will automatically have access to the state variable. For example, the following method will just return the current date without incrementing:

sub current { $state = shift; ($w,$d,$m)=@$state; return "$W[$w] $d $M[$m] 2012"; }

At this point, it should be easy to implement a previous() method which sets back the state to the previous date; try it!

Conclusion

This article just scratched the surface of Perl's object design. Perl has more "syntactic sugar" available which gives it a real OO look-and-feel; just search for some Perl programs on the internet to get an idea. On the other hand, for most types of problems for which one typically chooses Perl for the implementation, the use of objects would just be a needless complication. This should be clear from the first example! If reusability is not an issue, objects are often not useful at all.

Perl gives us a plethora of tools: it's up to us, the programmers, to choose the right one for the right task...