Old School to New School: Refactoring Perl

At YAPC::NA I sat in on lots of great talks (I also won Randal Schwartz in the charity auction, and so got to be beaten soundly at pool by him, and learn a few things about Smalltalk and Seaside). In particular, Michael Schwern gave a fantastic talk entitled Skimmable Code: Fast to Read, Fast to Changeā€Ž. This got me thinking about our own code. Webmin is an old codebase, approaching 11 years old, and thus has some pretty old school Perl practices throughout. Coding standards sort of stick to projects over a few years, and as new code comes in, it tends to look like the old code. And, to add to that momentum, Jamie has religiously kept compatibility for module authors throughout the entire life of the project. Modules written ten years ago can, astonishingly, be expected to work identically in todays Webmin, though they might not participate in logging or advanced ACLs or other nifty features that have come to exist in the framework in that time.

So, when I found myself needing to make a modification to oschooser.pl, a small program for detecting the operating system on which Webmin is running (sounds trivial, but when you realize that Webmin runs on hundreds of operating systems and versions, it turns out to be a rather complex problem), I decided to take the opportunity to put into practice some of the niceties of modern Perl. This article is a little different than what I usually write for In the Box, in that it covers a lot of ground fast, and most of it is probably pretty mundane stuff for folks already writing modern Perl. But, I think there’s enough old Perl code running around out there, running the Internet and such, that it’s worth talking about modernization work.

So, let’s go spelunking!

Introduction to oschooser.pl

The code we’ll be picking apart, and putting back together, is probably one of the more heavily used pieces of Perl code, and certainly one of the oldest, in the wild. It’s the OS detection code that Webmin and Usermin use to figure out what system they’re running on during installation. With Webmin having 12 million (give or take several million) downloads over its ten year history, this equals a lot of operating systems successfully detected. Perhaps I should have picked something a little less important for my first stab at modernization, but I’ve rarely been accused of being smart about making sweeping changes! (Jamie will reel me in, before I break actual Webmin code. I manage to break Virtualmin every now and then…but he’s more suspicious when I check code into Webmin, since it happens quite rarely.)

The oschooser.pl program actually loads up a rather complex definitions file called os_list.txt (by default, though it’s configurable, and we use different lists for Virtualmin and Webmin, since they have different requirements for version identification). The definitions file can contain snippets of Perl code, which will be executed via eval, when appropriate. Most of the updates to OS detection over the years have happened in os_list.txt, so oschooser.pl hasn’t seen a lot of grooming over the years, which makes it a prime candidate for modernization. Assuming, of course, that it works identically when I’m done with it.

Where to start?

My end goal with this project is to make oschooser.pl usable as a library from Perl programs, since our new product installer is written in Perl rather than POSIX shell. I also figured it’d be nice to make it testable, since I’ve made several mistakes in the detection code (in os_list.txt, specifically) over the past few years that led to our product being uninstallable on some systems until the bug was tracked down. But, first things first. Almost nothing in Webmin is strict compatible, and even warnings can cause some complaints, so that seems like a good starting point.

The code we’re starting with can be found here, so you can follow along at home.

Enabling warnings reveals the following (don’t worry about the arguments for now):

$ perl -w oschooser.pl os_list.txt outfile 1
Name "main::uname" used only once: possible typo at oschooser.pl line 31.
Name "main::donename" used only once: possible typo at oschooser.pl line 17.

Not too bad, actually. Just a couple of variables that are only seen once, easy enough to fix by giving them a my declaration. Though, in this case, it looks like enabling warnings turns up some unused code. While donename is actually keeping track of what names we’ve seen, so far, and it’s one of several idiomatic ways to build an array of unique values, the uname variable seems to have no purpose. So I’m going to kill that whole line rather than declare it.

Next up in our “low-hanging fruit” exercise is enabling use strict. Turns out this is quite a lot more intimidating:

$ perl -c oschooser.pl
Global symbol "$oslist" requires explicit package name at oschooser.pl line 15.
Global symbol "$out" requires explicit package name at oschooser.pl line 15.
Global symbol "$auto" requires explicit package name at oschooser.pl line 15.
Global symbol "$oslist" requires explicit package name at oschooser.pl line 16.
Global symbol "$oslist" requires explicit package name at oschooser.pl line 16.
Global symbol "@list" requires explicit package name at oschooser.pl line 20.
Global symbol "@names" requires explicit package name at oschooser.pl line 21.
Global symbol "%names_to_real" requires explicit package name at oschooser.pl line 22.
Global symbol "$auto" requires explicit package name at oschooser.pl line 27.
Global symbol "$etc_issue" requires explicit package name at oschooser.pl line 30.
Global symbol "$etc_issue" requires explicit package name at oschooser.pl line 33.
Global symbol "$o" requires explicit package name at oschooser.pl line 36.
Global symbol "@list" requires explicit package name at oschooser.pl line 36.
Global symbol "$o" requires explicit package name at oschooser.pl line 37.
Global symbol "$o" requires explicit package name at oschooser.pl line 37.
Global symbol "$ver" requires explicit package name at oschooser.pl line 39.
Global symbol "$o" requires explicit package name at oschooser.pl line 39.
Global symbol "$ver" requires explicit package name at oschooser.pl line 40.
Global symbol "$ver" requires explicit package name at oschooser.pl line 41.
Global symbol "$o" requires explicit package name at oschooser.pl line 41.
Global symbol "$ver" requires explicit package name at oschooser.pl line 41.
Global symbol "$ver" requires explicit package name at oschooser.pl line 43.
Global symbol "$ver" requires explicit package name at oschooser.pl line 44.
Global symbol "$o" requires explicit package name at oschooser.pl line 44.
Global symbol "$ver" requires explicit package name at oschooser.pl line 44.
Global symbol "$o" requires explicit package name at oschooser.pl line 49.
Global symbol "$ver" requires explicit package name at oschooser.pl line 53.
Global symbol "$auto" requires explicit package name at oschooser.pl line 54.
Global symbol "$auto" requires explicit package name at oschooser.pl line 59.
Global symbol "$rv" requires explicit package name at oschooser.pl line 61.
Global symbol "$auto" requires explicit package name at oschooser.pl line 67.
Global symbol "$auto" requires explicit package name at oschooser.pl line 72.
Global symbol "$auto" requires explicit package name at oschooser.pl line 77.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 80.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "@names" requires explicit package name at oschooser.pl line 81.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 82.
Global symbol "$i" requires explicit package name at oschooser.pl line 82.
Global symbol "@names" requires explicit package name at oschooser.pl line 82.
Global symbol "$i" requires explicit package name at oschooser.pl line 82.
Global symbol "$tmp_base" requires explicit package name at oschooser.pl line 84.
Global symbol "$temp" requires explicit package name at oschooser.pl line 85.
Global symbol "$tmp_base" requires explicit package name at oschooser.pl line 85.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 86.
Global symbol "$temp" requires explicit package name at oschooser.pl line 86.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 87.
Global symbol "$temp" requires explicit package name at oschooser.pl line 87.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 88.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 88.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 89.
Global symbol "$name" requires explicit package name at oschooser.pl line 96.
Global symbol "@names" requires explicit package name at oschooser.pl line 96.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 96.
Global symbol "@vers" requires explicit package name at oschooser.pl line 97.
Global symbol "$name" requires explicit package name at oschooser.pl line 97.
Global symbol "@list" requires explicit package name at oschooser.pl line 97.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 98.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "@vers" requires explicit package name at oschooser.pl line 99.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 100.
Global symbol "$i" requires explicit package name at oschooser.pl line 100.
Global symbol "$name" requires explicit package name at oschooser.pl line 100.
Global symbol "@vers" requires explicit package name at oschooser.pl line 100.
Global symbol "$i" requires explicit package name at oschooser.pl line 100.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 102.
Global symbol "$temp" requires explicit package name at oschooser.pl line 102.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 103.
Global symbol "$temp" requires explicit package name at oschooser.pl line 103.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 104.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 104.
Global symbol "$temp" requires explicit package name at oschooser.pl line 105.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 106.
Global symbol "$ver" requires explicit package name at oschooser.pl line 110.
Global symbol "@vers" requires explicit package name at oschooser.pl line 110.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 110.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 114.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 115.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "@names" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 122.
Global symbol "@names" requires explicit package name at oschooser.pl line 122.
Global symbol "$i" requires explicit package name at oschooser.pl line 122.
Global symbol "$i" requires explicit package name at oschooser.pl line 123.
Global symbol "$i" requires explicit package name at oschooser.pl line 125.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 126.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 128.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 129.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 134.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 134.
Global symbol "@names" requires explicit package name at oschooser.pl line 134.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 135.
Global symbol "$name" requires explicit package name at oschooser.pl line 141.
Global symbol "@names" requires explicit package name at oschooser.pl line 141.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 141.
Global symbol "$name" requires explicit package name at oschooser.pl line 142.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 146.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 147.
Global symbol "$ver" requires explicit package name at oschooser.pl line 153.
Global symbol "$name" requires explicit package name at oschooser.pl line 153.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 153.
Global symbol "%names_to_real" requires explicit package name at oschooser.pl line 154.
Global symbol "$name" requires explicit package name at oschooser.pl line 154.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 154.
Global symbol "$out" requires explicit package name at oschooser.pl line 159.
Global symbol "$ver" requires explicit package name at oschooser.pl line 160.
Global symbol "$ver" requires explicit package name at oschooser.pl line 161.
Global symbol "$ver" requires explicit package name at oschooser.pl line 162.
Global symbol "$ver" requires explicit package name at oschooser.pl line 163.
Global symbol "$d" requires explicit package name at oschooser.pl line 170.
Global symbol "$rv" requires explicit package name at oschooser.pl line 172.
Global symbol "$rv" requires explicit package name at oschooser.pl line 174.
Global symbol "$d" requires explicit package name at oschooser.pl line 177.
Global symbol "$d" requires explicit package name at oschooser.pl line 178.
Global symbol "$rv" requires explicit package name at oschooser.pl line 178.
Global symbol "$d" requires explicit package name at oschooser.pl line 178.
Global symbol "$rv" requires explicit package name at oschooser.pl line 181.
oschooser.pl had compilation errors.

Wow! I think that might be more lines than the program itself. Luckily, it’s almost entirely unscoped variables. A quick pass over the code, adding my declarations to the obvious candidates, gets things looking a little better. One tricky bit is the $i loop variables used in for loops. We don’t want those to be declared several times in the code, and we don’t want them to leak out into the scope of the rest of the program. In modern Perl, this is no problem, as you can use the following:

    for(my $i=0; $i<@names; $i++) {
      $cmd .= " ".($i+1)." '$names[$i]'";
      }

And $i will be local to the for loop. I momentarily feared that I’d need to use an outer block to accomplish this, as Webmin needs to be compatible with quite old Perl versions (5.005, for core Webmin, unless Unicode support is needed, in which case 5.8.1 is required), but after downloading and installing Perl 5.005_4, I found that was an unnecessary precaution. The foreach loops can also make use of this convenient feature. If you do happen to be stuck with an even more ancient version that 5.005 (but still higher than 4)–though I can’t imagine how you could, as 5.005 is over nine years old–you can use the following:

  {
  my $i;
    for($i=0; $i<@names; $i++) {
      $cmd .= " ".($i+1)." '$names[$i]'";
      }
  }

Which provides similar private scope for the $i variable, at the cost of three extra lines.

Making it Testable

So far, I haven’t made any changes that are likely to break the code. It’s merely been cleanup and syntax tweaks. But, to accomplish everything I’d like in this exercise, we’ll be doing some refactoring and refining the code. To do that with confidence, it’d be nice to have some tests to insure the code works the same before and after any changes.

Since this is not historically a library, it’s not particularly easy to test. One could write a custom test harness, or use Test::Command, and test its behavior as a whole, but since it’s written in Perl and one of my goals is to make it useful as a library from Perl scripts, I decided instead to make it loadable as a module and use Test::More. A trick that’s very common in the Python world, but doesn’t seem as well-known amongst Perlmongers is a main function which is called if the script is executed independently rather than via use or require. The main function then calls whatever the script would normally do, optionally setting up variables or parsing command line arguments.

So, I added the following near the beginning of the file:

# main
sub main() {
if ($#ARGV < 1) { die "Usage: $0 os_list.txt outfile [0|1|2|3]\n"; }
my ($oslist, $out, $auto) = @ARGV;
oschooser($oslist, $out, $auto);
}
main() unless caller();  # make it testable and usable as a library

I also took this opportunity to add a simple usage message if the command is executed with fewer than two arguments (@ARGV, like all Perl arrays starts counting at 0). I also needed to wrap the main function of the script in a sub block, so that the script doesn’t do anything immediately if loaded as a library.

Make it a Module

Since I want to use this code as a library, I face a choice. The use statement is functionally equivalent to:

BEGIN { require Module; Module->import( LIST ); }

Which means, I suppose, I could keep the name oschooser.pl and use:

require 'oschooser.pl';

We don’t need BEGIN level assurance, since we have no prototypes in this library and only use simple subroutines. But, I find this a bit unsatisfying, since it’s no longer in common use amongst Perl developers, and use provides the ability to export functions explicitly. Test::More has both a use_ok and a require_ok function, so it’s irrelevant from a testing perspective. It’ll probably remain oschooser.pl in Webmin proper, and OsChooser.pm in my Virtualmin installer library, at least for the foreseeable future. Not really a lot of difference between the two.

Some Tests

So, now that we can call the library roughly the way we want, using use, it’s time to write a few tests to be sure things actually work after we begin making more sweeping changes.

We can start with simple compile tests (I usually call these types of tests t/return.t, as they just check to be sure the module returns without error on load and the functions within return the data type that is expected):

#!/usr/bin/perl -w
# These tests just check to be sure all functions return something
# It doesn't care what it is returned...so garbage can still pass,
# as long as the garbage is the right data type.
 
use strict;
use Test::More qw(no_plan);
 
use_ok( 'OsChooser' );
 
isa_ok(\OsChooser::have_tty(), 'SCALAR');
isa_ok(\OsChooser::has_command("cp"), 'SCALAR');

Hmm…OK, so we don’t actually have a lot to test yet, just a couple of utility functions (and I’ve even cheated a little and looked ahead to where I introduced a have_tty function, or this would be an even shorter set of tests). The most important function, oschooser, doesn’t know how to return anything very useful yet. It can only write out its findings to a file. But, since we’re always going to be creating that file, regardless of how nice the module usage becomes, we need to figure out how to test it anyway.

Unsurprisingly, there is already a full-featured module on CPAN for testing the contents of files, called, unlikely though it may seem, Test::Files. So, we’ll just grab that:

$ sudo perl -MCPAN -e shell
 
cpan shell -- CPAN exploration and modules installation (v1.7602)
ReadLine support enabled
 
cpan> install Test::Files
...

And then create as many Operating System definition files as we want in the t directory. We’ll just name them for the OS they represent. This is the kind of testing I love, because the actual test file will be extremely simple, no matter how many operating systems I want to test on:

#!/usr/bin/perl -w
use strict;
use OsChooser;
 
# Get a list of the example OS definition files
opendir(DIR, "t/") || die "can't opendir t/ $!";
my @files = grep { /\.os/ } readdir(DIR);
closedir DIR;
use Test::More qw(no_plan);
use Test::Files;
 
foreach my $file (@files) {
  $file =~ /(.*)\.os$/;
  my $osname = $1;
  my $outfile = "t/outfile";
  OsChooser::oschooser("os_list.txt", $outfile, 1);
  compare_ok("t/$file", $outfile, $osname);
 
  # Cleanup
  unlink $outfile;
}

I love data-driven software, and this is a fun little example of it. We can run as many tests as we want, merely by adding more OS data files–one with the “os” suffix to provide what should be output by oschooser and one to contain the file that oschooser would normally use to identify the OS (/etc/issue, among others), which isn’t yet supported, but I’ll talk about it in the next post. Speaking of being data-driven, I think it’d also be pretty nifty to get the test count from the @files array, rather than using no_plan, but because modules loaded with use are loaded early during compile time (in a BEGIN block, effectively) we don’t actually have anything in @files yet.

However, as mentioned, the oschooser function doesn’t yet allow one to specify the issue file to look at, so no matter how many definitions I provide, it’ll never be able to test anything but the OS the test is running on. Oh, well, for now we’ll just create one OS definition file that matches my current OS, and make it a priority to make the function more testable somehow, possibly via an optional parameter to oschooser.

Alright, so now that we have some rudimentary tests in place, we can break stuff with confidence! We’ll come back to testing again in the near future, since we’re leaving so much untested right now.

Plain Old Documentation

I’m going to take a quick detour now that we’ve got some basic tests in place. Testing is one practice that most developers agree makes for great code, and the other practice that most folks can agree on is documentation.

Since this is such a simple piece of code, and was intended exclusively for use during installation of Webmin and Usermin, Jamie never really documented it. Now that I’m forcing it to be useful in other locations, and having some fun giving it a modern Perl face lift, it’s as good a time as any to add some documentation. POD isn’t the only documentation format usable within Perl code, but it is, by far, the most popular, and it has lots of great tools for processing and testing coverage, so that’s what Jamie recently chose for use in documenting the Virtualmin API. It’s also easy to learn, and results in text that is pretty readable even before processing.

I’m not sure of the recommended practices for documenting scripts that work on both the command line and as a module, but here’s what I came up with:

=head1 OsChooser.pm
 
Attempt to detect operating system and version, or ask the user to select
from a list.  Works from the command line, for usage from shell scripts,
or as a library for use within Perl scripts.
 
=head2 COMMAND LINE USE
 
OsChooser.pm os_list.txt outfile [auto]
 
Where "auto" can be the following values:
 
=over 4
 
=item 0
 
always ask user
 
=item 1
 
automatic, give up if fails
 
=item 2
 
automatic, ask user if fails
 
=item 3
 
automatic, ask user if fails and if a TTY
 
=back
 
=head2 SYNOPSIS
 
    use OsChooser;
    my ($os_type, $version, $real_os_type, $real_os_version) =
       OsChooser->oschooser("os_list.txt", "outfile", $auto, [$issue]);
 
=cut

Pretty simple, but covers the basics.

Next Time

Unfortunately, the code is now longer and probably a little less readable than before! It’s probably more robust to changes, since it now has reasonably scoped variables. And it’s more friendly to others who might want to use it, due to the new documentation and the ability to use it as a library in Perl or as a command in shell scripts.

Next time we’ll start in on the refactoring, and we’ll also write some more tests. This is turning into a real challenge, due to the data-driven nature of the script, and the fact that it’s somewhat hardcoded to look for OS data in very specific locations. Since, a big part of what I want to test is in the os_list.txt file, we don’t have the luxury of just saying, “It’s configuration…we’ll just make a special version for testing purposes.” We’ll have to get far more clever.