Homebrew Package Installation for Servers

Homebrew logo

One of our customers came to us a couple of weeks ago wanting to run Virtualmin on Mac OS X. I, foolishly, said, “Sure! I can help with that. I installed Virtualmin on OS X several years ago, how much different can it be? It shouldn’t take more than a couple of hours.” It turns out, it can be remarkably different. In the intervening years, Mac OS X has evolved in many interesting directions, mostly positive, some questionable. Mac OS X remains exceedingly weak for server usage, for reasons well out of the scope of this short article. But, it is quite strong for desktop/laptop use, and many people want to be able to develop their web applications on Mac OS X even if they will be deployed to Linux servers.

Enter Homebrew

The last time I setup Virtualmin on Mac OS X systems, the best package management tool, and the best way to install a lot of Open Source software, was Fink (Mac Ports, then Darwin Ports, was still quite new at the time).

Fink is an apt repository of dpkg packages built for Mac OS X. I love apt and dpkg (almost as much as yum and rpm), and Webmin/Virtualmin have great support for apt and dpkg, and so I was all set to choose Fink for this new deployment. But, there are some issues with Fink. First, it installs everything with its own packages, including stuff that is already available from the base system. For Virtualmin, whenever possible, I like to use the system standard packages for the services it manages. Homebrew is designed to work with what Apple provides when possible, which is somewhat more aligned with the Virtualmin philosophy.

Second, and perhaps more important, Fink is seemingly much less popular and much less actively maintained than Homebrew. I’m not sure why. Possibly because the Homebrew website looks good, and the documentation is very well-written, while the Fink home page is a little drab and looks complicated. And, Fink package versions tend to be quite a bit older than the packages provided by the very active Homebrew project. This can be a much more serious issue. Security updates are absolutely vital in a web server, and a package repository that is actively maintained is the best way to insure you’ll have security updates.

So, I’ve spent the past couple of days experimenting with Homebrew. It’s a pretty nice system, and its community and developers are active, responsive, and helpful. All great things. But, its primary advertised feature is also its biggest weakness and most dangerous mistake.

Installation Without sudo

Or: Homebrew Considered Harmful

One of the major advertised features of Homebrew is that you can install it, and any package, without root or sudo privileges. There are good reasons one might want this, but on a server, it has alarming side effects, and it is one of the first things I would need to correct for our use case (of installing a virtual hosting stack and Virtualmin). The example I’ll use here is that of MySQL. When you install the mysql package from Homebrew, it will be installed with the ownership of all files set to the user that installed the package. And, more dangerously, it will be setup to run as the user that installed it.

This decision was made because Homebrew often builds the software at install time, rather than providing a binary package (there is a new “bottles” feature installs binary packages, but that wasn’t intended to address the sudo problem). The risk of building software with sudo or root privileges is very real, and in this case it results in the choice to build the software as a non-root user.

Other package managers, like dpkg and rpm, resolve this problem with toolchains designed around building the packages within a chroot so that unwanted behavior is contained. For example, mock on Fedora and CentOS provides an easy to use tool for building packages across many distributions and versions inside of a chroot environment with only the dependencies specified by the package. The most popular Linux distributions distribute binary packages that were built in a controlled environment. But, Homebrew generally builds the software at install time, with no chroot to protect the system from broken or hostile build processes. And, so they insist you run it as a non-root user. This is, I suppose, a logical conclusion to come to, based on the premise of a package manager that builds software on the user’s system without being confined in a container or chroot, but it has negative consequences.

For example, when I install MySQL from Homebrew, everything is owned by joe:staff. The provided property list file for starting the server is also designed to start it as that user, when the user logs in. For a development system, this may not be a big deal, and even makes a certain sort of sense (I prefer my development environment to more closely mirror my deployment environment, but I can see reasonable arguments for the way they do it). But, for a server, it is simply untenable.

The most important reason it is a bad decision is that it leads to many, possibly all, of the services running with the privileges of the user that installed them. Which, in most cases, is probably a powerful user (mine is an administrative user with sudo privileges, for example). So, in the event any of the services are compromised, all of the services will be compromised, and likely so will the user account in question. The security implications of this really cannot be overstated. This is a huge problem.

This is why Linux and UNIX systems (and even Apple, who aren’t historically renowned for their strong multi-user security practices) run all services as different users, and with restricted privileges. On the average LAMP system, there will be an apache or www user that runs Apache, a mysql user that runs MySQL, a nobody user that runs Postfix, and web applications will usually be run as yet another user still. These special users often have very restricted accounts, and may not even have a shell associated with them, further limiting the damage that can occur in the event of an exploit of any one service. Likewise, they may be further restricted by SELinux or other RBACL-based security. Any one of these services or applications being compromised through any means won’t generally compromise other services or users. Homebrew throws that huge security benefit away to avoid having to sudo during installation.

It’s probably too late to convince the Homebrew folks to backtrack on that decision. But, it’s not terribly difficult to fix for one-off installations, and many do consider it a valuable feature of Homebrew. Fixing the installed services as I’ve done has some side effects that may also be dangerous, which I’ll go into at the end of the article, but since I figured out how to do it, I thought I’d document it. During my research I found that an alarming number of users are using Homebrew in server environments and I found a number of users asking similar questions about various services, so, maybe this will help some folks avoid a dangerous situation.

So, let’s get started. After installation of MySQL (using the command brew install mysql), here’s the changes you’ll want to make.

Update The Property Lists File

The way Homebrew recommends running MySQL after installation is to link the provided plist file in /usr/local/opt/mysql/homebrew.mxcl.mysql.plist into your ~/Library/LaunchAgents directory, and add it using the launchctl load command. This sets it up to run at all times when your user is logged in, which is great if you’re developing and only need it running when you’re logged in and working. But, we want it to run during system boot without having any users logged in, and even more importantly we want it to run as the _mysql user.

So, instead of linking it into your local LaunchAgents directory, as the documentation suggests, copy it into your system /Library/LaunchDaemons directory.

$ sudo cp /usr/local/opt/mysql/homebrew.mxcl.mysql.plist /Library/LaunchDaemons

Then edit the file to add user and group information (you’ll have to use sudo), add a –user option, and change the command to mysqld and WorkingDirectory to /usr/local/var/mysql. Mine looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>UserName</key>
  <string>_mysql</string>
  <key>GroupName</key>
  <string>_mysql</string>
  <key>KeepAlive</key>
  <true/>
  <key>Label</key>
  <string>homebrew.mxcl.mysql</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/opt/mysql/bin/mysqld</string>
    <string>--bind-address=127.0.0.1</string>
    <string>--datadir=/usr/local/var/mysql</string>
    <string>--user=_mysql</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>WorkingDirectory</key>
  <string>/usr/local/var/mysql</string>
</dict>
</plist>

Notice the addition of the UserName and GroupName keys, both set to _mysql, as well as several other altered lines.

Note: I am not a Mac OS X or launchd expert. There are a number of aspects of the Mac OS X privilege model that I do not understand. I would welcome comments about how the security of this configuration might be improved. Also, the entirety of my experience with launchd is the several hours I spent playing with it and reading about it to convince MySQL to run as a different user. But, I’m pretty much certain that the way Homebrew does servers is worse than what I’ve done here.

Change Ownership of MySQL Installation and Databases

The _mysql user does not have permissions to read things owned by the joe user (or the user you used to install MySQL with Homebrew), so you’ll need to change ownership of all MySQL data files to _mysql:wheel.

$ sudo chown -R _mysql:wheel /usr/local/var/mysql/

Change Ownership of the Property List File

Property list files in the Library/LaunchDaemons directory (or the LaunchAgents directory) must belong to root, for security reasons. So, you’ll need to update that, as well.

$ sudo chown root:wheel /Library/LaunchDaemons/homebrew.mxcl.mysql.plist

Load and Start the MySQL Daemon

The launchctl command manages agents and daemons, and we can add our new service by loading the property list:

$ sudo launchctl load /Library/LaunchDaemons/homebrew.mxcl.mysql.plist

And, then we can start MySQL:

$ sudo launchctl start homebrew.mxcl.mysql.plist

If anything goes wrong, check the system.log:

$ sudo tail -f /var/log/system.log
I found the documentation at launchd.info particularly helpful when working out how to use launchd.

Future Concerns

Since this corrects the big security issue with the Homebrew installation of MySQL, and this technique could reasonably easily be applied across every Homebrew-installed service, why aren’t we happy?

Updates are part of security, too.

The most important reason to use a package manager is not to make it easy to install software, contrary to popular belief, and if all the package manager does is provide easy to install packages, it is not an effective package manager. The most important reason to use a package manager is to make it easy to update software.

Homebrew makes installation and upgrades reasonably easy, but the steps I’ve taken in this article to make MySQL run as its own user seems likely to break updates, since some files created during installation have changed ownership. A newer version of MySQL isn’t available in the Homebrew repository, so I can’t test whether it does break upgrades or not. Nonetheless, fixing this issue compounded across many services (the Virtualmin installation process normally installs: Apache, MySQL/MariaDB, PostgreSQL, Postfix, Dovecot, BIND, Procmail, SpamAssassin, ClamAV, Mailman, and a bunch more) will likely prove to be a maintenance challenge that is probably not worth the effort.

So, despite having figured out how to make this work, I’m now going to spend the same amount of time and effort giving Mac Ports a thorough test drive. I have a pretty strong suspicion it will be a better fit for server usage. This way of working feels like fighting against the way Homebrew wants to operate, and when you find yourself having to work so hard against the tool, it’s probably the wrong one for the job.

And What Does All This Mean for Virtualmin on Mac OS X?

Well, I’m a sucker for the sunk cost fallacy, so I’m planning to spend another couple of days working out a basic install script for Virtualmin on Mac OS X, probably using Mac Ports (I can see a way forward using Homebrew, but I don’t like the terrain). I’ll likely never recommend Mac OS X for a production server deployment, but it’s certainly not the worst OS out there for the purpose.

Multi-account Twittering from the command line

While working on our migration from Joomla to Drupal over at Virtualmin.com, I’ve been keeping folks apprised of what I’m up to by posting the highlights to Twitter on our virtualmin Twitter account. So, I’ve had to switch back and forth quite a bit from my personal account. It also means I have to have another tab open quite frequently for Twitter. I looked around for a standalone Twitter client that supports multiple accounts, and found the pickings were pretty slim on Linux (a few Air clients run on Linux, but not very well, unfortunately). But, I got to thinking that a command line client would be ideal…so I searched the web, and found a brief article here with a simple one-liner to submit a tweet to Twitter. The fact that Twitter works with a one-line shell script is awesome, and explains why there are so many Twitter clients.

Anyway, I first thought, “I could just make copies of that for each account.” The I could call, “twitter-swelljoe” for my personal account and “twitter-virtualmin” for my business account. But that seemed wrong, somehow.

So, I wrote a version that accepts a command line argument of the username, and then sends the rest as a tweet.

#!/bin/bash
USER=$1
shift
TWIT=$@
PASS=password
curl -u $USER:$PASS -d status="$TWIT" http://twitter.com/statuses/update.xml>/dev/null

Nothing really fancy going on here, but the “shift” is saying, “drop the first token from the arguments list”, and then $@ says “give me the rest”. So, no quotes required for the tweet. I saved the filed as tw, and use it like so:

tw swelljoe Tweeting from the command line!
tw virtualmin Business-y kinds of tweets from the command line.

This assumes one password for all Twitter accounts, which is exactly what I do anyway (I use SuperGenPass, which always generates the same password when given the same URL and passphrase). You could instead store the passwords in an associative array (gotta be using Bash 4.0+ for that), something like this (I can’t test this, as I don’t have bash 4.0!):

#!/bin/bash
USER=$1
shift
TWIT=$@
 
declare -A PASS
PASS[swelljoe]="passwd"
PASS[virtualmin]="notapasswd"
 
curl -u $USER:${PASS[$USER]} -d status="$TWIT" http://twitter.com/statuses/update.xml>/dev/null

There are ways to fake associative arrays in older versions of bash, or use other tools, but since I don’t need multiple passwords, I’m going to just hand-wave that away.

I find it amusing that the first version of this twitter client, at 137 characters, is a valid tweet. It makes me want to go golfing and see how small I can make an even more functional version…at the very least, I can’t resist:

tw swelljoe `cat tw`

And then I have to go change my password after realizing I tweeted the real version rather than the demonstration version. Good thing nobody follows my personal tweets.

Creating an iPhone UI for Virtualmin

Introduction and History

Around the start of the year, I created a theme for Webmin designed to make it easier to access from mobile devices like smartphones and cellphones with simple built-in browsers. At the time I had a Treo 650, which had a very basic browser – certainly not powerful enough to render the standard Virtualmin or Webmin framed themes.

By using Webmin’s theming features, I was able to create a UI that used multiple levels of menus to access global and domain-level settings, instead of frames. The theme also changed the layouts of forms to be more vertical than horizontal, use fewer tables, and remove all use of Javascript and CSS for hiding sections.

This was released in the virtual-server-mobile theme package version 1.6, and all was good in the world. Anyone using it could now access all the features of Virtualmin from a very basic browser, and read mail in Usermin without having to rely on the awful IMAP implementations in most smartphones.

This shows what Virtualmin looked like on a Treo :

Then I bought an iPhone.

It has a much more capable browser, technically the equal of any desktop browser like Firefox or IE. The regular Webmin themes actually worked fine, although a fair bit of zooming is needed to use them. The mobile theme looked like crap, as it didn’t use any of the browser features like CSS and Javascript that the iPhone supports. Plus the layout rendered poorly due to the use of long lines of text that didn’t get wrapped at the browser’s screen width.

On the iPhone, the Create Alias page in mobile theme looked like this :

And in the regular Virtualmin theme, the Create Alias page looked like :

I mentioned this to Joe, and he pointed me at iUI, an awesome library of CSS and Javascript that allows developers to create websites that mimic the look of native iPhone applications. After trying out the demos and looking at their source code, it was clear that iUI would be perfect for creating an iPhone-specific theme.

It wasn’t quite as simple as I first thought, but after some hacking on both the theme code and iUI itself I was able to come up with a pretty good layout, as you can see in this screenshot of the Create Alias page :

Menu Implementation

Actually getting IUI to play nicely with the Webmin theming system was slightly more complex than I originally expected though. For example, an iPhone-style multi-level menu that slides to the left is implemented in IUI with HTML like :

<ul id='main' title='My Menu' selected='true'>
<li><a href='#menu1'>Submenu One</a></li>
<li><a href='#menu2'>Submenu Two</a></li>
</ul>
<ul id='menu1' title='Submenu One'>
<li><a href='foo.cgi'>Foo</a></li>
<li><a href='bar.cgi'>Bar</a></li>
</ul>
<ul id='menu2' title='Submenu Two'>
<li><a href='quux.cgi'>Quux</a></li>
<li><a href='#page'>Some page</a></li>
</ul>
<div id='page' class='panel' title='Some page'>
Any HTML can go here.
</div>

As you might guess, CSS and Javascript are used to show only one menu or div at a time, even though they are all in the same HTML file. This is quite different to the way menus are usually created in Webmin.

To get this kind of HTML from the theme, I created an index.cgi that generates a large set of <ul> lists and <div> blocks containing all the Virtualmin domains, global settings, Webmin categories and modules. This is loaded by the iPhone when a user logs in, and allows quick navigation to without any additional page loads. For example, these screenshots show the path down to the Users and Groups module. Only the last requires an extra page load :

The index.cgi script is able to fetch all Webmin modules and categories with the functions get_visible_module_infos and list_categories, which are part of the core API. It also fetches Virtualmin domains with virtual_server::list_domains and global actions with virtual_server::get_all_global_links.

For example, the code that generates the menus of modules and categories looks roughly like :

my @modules = &get_visible_module_infos();
my %cats = &list_categories(\@modules);
print "<ul id='modules' title='Webmin Modules'>\n";
foreach my $c (sort { $b cmp $a } (keys %cats)) {
    print "<li><a href='#cat_$c'>$cats{$c}</a></li>\n";
    }
foreach my $c (sort { $b cmp $a } (keys %cats)) {
    my @incat = grep { $_->{'category'} eq $c } @modules;
    print "<ul id='cat_$c' title='$cats{$c}'>\n";
    foreach my $m (sort { lc($a->{'desc'}) cmp lc($b->{'desc'}) } @incat) {
        print "<li><a href='$m->{'dir'}/' target=_self>$m->{'desc'}</a></li>\n";
        }
    print "</ul>\n";
    }
print "</ul>\n";

The actual IUI styling and menu navigation comes from CSS and Javascript files which are referenced on every page in the <head> section, generated by the theme’s theme_header function which overrides the Webmin header call.

Other Pages

Other pages within Webmin are generated using the regular CGI scripts, but with their HTML modified by the theme. This is done by overriding many of the ui_ family of functions, in particular those that generate forms with labels and input fields. Because the iPhone screen is relatively narrow, it is more suited to a layout in which all labels and inputs are arranged vertically, rather than the Webmin default that uses multiple columns.

For example, the theme_ui_table_row override function contains code like :

if ($label =~ /\S/) {
    $rv .= "<div class='webminTableName'>$label</div>\n";
    }
$rv .= "<div class='webminTableValue'>$value</div>\n";

The label and value variables are the field label and input HTML respectively. The actual styling is done using CSS classes that were added to IUI for the theme. The same thing is done in functions that render multi-column tabs, tabs and other input elements generated with ui_ family functions.

The only downside to this approach is that not all Webmin modules have yet been converted to use the functions in ui-lib.pl, and so do not get the iPhone-style theming. However, I am working on a long-term project to convert all modules from manually generated HTML to the using the UI library functions.

Headers and Footers

In most Webmin themes, there are links at the bottom of each page back to previous pages in the heirarchy – for example, when editing a Unix group there is a link back to the list of all groups.

However, IUI puts the back link at the top of the page next to the title, as in native iPhone applications. Fortunately, CSS absolute positioning allows the theme to place this link at the top, even though it is only generated at the end of the HTML. The generated HTML for this looks like :

<div class='toolbar'>
<h1 id='pageTitle'></h1>
<a class='button indexButton' href='/useradmin/index.cgi?mode=groups' target=_self>Back</a>
<a class='button' href='/help.cgi/useradmin/edit_group'>Help</a>
</div>

The toolbar CSS class contains the magic attributes needed to position it at the top of the page, even though the theme outputs it last.

Old School to New School: Refactoring Perl (part 2)

When I left off, I’d made an old chunk of Perl code (that mostly pre-dates widespread availability of Perl 5) warnings and strict compliant, converted it to be easily usable as both a module and a command line script, added some POD documentation, and built a couple of rudimentary tests to make it possible to change the code without fearing breakage. Now we can get rough with it.

Refactoring for Clarity and Brevity

Despite the changes, so far, the code is pretty much as it was when we started. A little more verbose due to the changes for strict compliance, so that’s a negative, and the addition of a main function and the oschooser wrapper just adds even more lines of code. The main code block was already a little bit long for comfort at several pages worth of 80 column text, assuming a 50 row editor window. Jamie seems capable of holding a lot of code in his head at once…me, I’m kinda slow, so I like small digestible chunks. So, let’s start digesting.

Looking through the code, this bit jumped right out. It’s a little bit unwieldy:

    if ($auto == 1) {
      # Failed .. give up
      print "Failed to detect operating system\n";
      exit 1;
      }
    elsif ($auto == 3) {
      # Do we have a tty?
      local $rv = system("tty >/dev/null 2>&1");
      if ($?) {
        print "Failed to detect operating system\n";
        exit 1;
        }
      else {
        $auto = 0;
        }
      }
    else {
      # Ask the user
      $auto = 0;
      }

It seemed like this could be shortened a little bit by making a have_tty function, and using && in the elsif. Not a huge difference, but if we then flip the tests over (there’s only four possible values of $auto and they only result in two possible outcomes) and add an || we can lose a few more lines, one conditional, and cut it down to:

    if (($auto == 3 && have_tty()) || $auto == 2) {
      $auto = 0;
      }
    else {
      # Failed .. give up
      print "Failed to detect operating system\n";
      exit 1;
      }

That’s a lot less code to read. I also think it makes more sense to have a single failure block and a single block setting $auto. I’m still trying to figure out the purpose of auto=2, since it seems like it would only be possible to fall back to asking a question, if there’s actually a TTY. But Jamie knows the quirks of various systems far better than I do, so we’ll leave it alone, for now, and keep the same behavior. I bet it’s to accommodate something funny on Windows!

I’ve also made a few tweaks, during this process, adding a package OsChooser; statement to the beginning of the file. I also discovered that $uname actually is being used in this code! It’s hidden inside the os_list.txt definitions. There are a couple of eval statements being used to execute arbitrary code on each line found in the OS list. Jamie must have been a Lisp hacker in a former life, with all this willy nilly mixing code and data. This is a pretty clever bit of code, but it took me a little while to grok, but now that I know what’s going on, we can solve some of the problems we had with testing detection of systems other than the one we’re running on.

In the meantime, just know that $uname has returned as an our variable, so that it can be “global” without ticking off strict.

More Tests

So, automated testing of an OS detection program isn’t a whole lot of good, if I can only test detection of one operating system (the one it happens to be running on right now). So, we need to introduce a bit more flexibility in where the OS-related data comes from. This is trickier than it sounds. UNIX and Linux has never standardized on one single location for identifying the OS. Many systems identify themselves in /etc/issue, while those from Red Hat use /etc/redhat-release (they also have a reasonable issue file, but I’m guessing it’s not reliably present or reliably consistent in its contents, as Jamie has chosen to use the release file, instead), and Debian has a /etc/debian_version file. Sun Solaris and the BSD-based systems seem to all use uname, and that’s just the popular ones. Webmin also supports a few dozen more branches of the UNIX tree, plus most modern Windows versions!

So, looking at oschooser.pl you’re probably wondering where the heck all of that extra stuff happens, because it doesn’t really have any detection code of its own. The answer is in os_list.txt, which is a file with lines like the following:

Fedora Linux     "Fedora $1" fedora  $1    `cat /etc/fedora-release 2>/dev/null` =~ 
/Fedora.*\s([0-9\.]+)\s/i || `cat /etc/fedora-release 2>/dev/null` =~ /Fedora.*\sFC(\S+)\s/i

This is a tab-delimited file. Why tabs? I have no idea, and it’s been a source of errors for me several times…even Jamie isn’t sure why he chose tabs as the delimiter, but that’s the way it is. It is plain text, plus numbered match variables, plus an optional snippet of Perl that will be executed via eval if it exists. This makes for an extremely flexible and powerful tool, if a wee bit intimidating on first glimpse.

So, that last field is the tricky bit. The thing I’m going to have to contend with if I want to be able to test every OS that Webmin supports, rather than just the one that happens to be sitting under the code while the tests are running. I’ll need a new argument to our oschooser function for starters, called $issue, which will generically contain whatever it is that os_list.txt uses to recognize a particular OS. On my Fedora 7 desktop system, that’s /etc/redhat-release, which contains:

Fedora release 7 (Moonshine)

So, oschooser now contains:

sub oschooser {
my ($oslist, $out, $auto, $issue) = @_;
...
}

Next, we need to make sure we keep the provided $issue if we got it, so we change this:

  # Try to guess the OS name and version
  if (-r "/etc/.issue") {
    $etc_issue = `cat /etc/.issue`;
    }
  elsif (-r "/etc/issue") {
    $etc_issue = `cat /etc/issue`;
    }
  $uname = `uname -a`;

Into:

# Try to guess the OS name and version
my $etc_issue;
if ($issue) {
  $etc_issue = `cat $issue`;
  $uname = $etc_issue; # Strangely, I think this will work fine.
  }
elsif (-r "/etc/.issue") {
  $etc_issue = `cat /etc/.issue`;
  }
elsif (-r "/etc/issue") {
  $etc_issue = `cat /etc/issue`;
  }

Note that $uname is defined earlier in the code now…and merely gets over-written if we’ve set the $issue variable in our function call.

And then we have to do something about the contents of the last field in os_list.txt before it gets evaluated. This is where it gets a little hairy. In the foreach that iterates through each line in the file testing whether we have a match or not, I’ve added a new first condition, so it now looks like:

foreach my $o (@list) {
  if ($issue && $o->[4]) {
    $o->[4] =~ s#cat [/a-zA-Z\-]*#cat $issue#g;
    } # Testable, but this regex substitution is dumb.XXX
  if ($o->[4] && eval "$o->[4]") {
    # Got a match! Resolve the versions
    $ver_ref = $o;
    if ($ver_ref->[1] =~ /\$/) {
      $ver_ref->[1] = eval "($o->[4]); $ver_ref->[1]";
      }
    if ($ver_ref->[3] =~ /\$/) {
      $ver_ref->[3] = eval "($o->[4]); $ver_ref->[3]";
      }
    last;
    }
  if ($@) {
    print STDERR "Error parsing $o->[4]\n";
    }
  }
  return $ver_ref;
}

Which performs a substitution on the last field, if it contains a cat command. It replaces it with the issue file that we’ve provided in the $issue variable. Thus, we can now pass in t/fedora-7.issue and put a copy of the /etc/redhat-release file mentioned above, and we’ll be able to test detection of Fedora 7, no matter what operating system the test is actually running on. I suspect we may run into trouble when we expand our os_list.txt to the full Webmin list, since I’m working with just the limited subset of systems the Virtualmin installer supports (or that I might support in the next year or so). I’ve made a comment in the code with XXX (merely a convention used in the Webmin codebase, though any odd sequence of characters that you’ll remember works fine…many folks use FIXME) to remind myself of this suspicion later if I do run into problems that this is the first place I ought to look.

After these changes, it’s possible to get serious about testing. So, I’ve added tests for a couple dozen systems, which was more Googling than coding due to the data-driven nature of my tests, and confirmed the new code is behaving identically to the old. Which means it’s time for…

More Refactoring

If you’ve been following along, you know that oschooser is still awfully long. A good tactic in such situations is to look for bits of functionality that can be pushed down into their own subroutines. One good choice is the parsing of patterns file at the very beginning of the function:

my @list;
my @names;
my %donename;
open(OS, $oslist) || die "failed to open $oslist : $!";
while(<OS>) {
  chop;
  if (/^([^\t]+)\t+([^\t]+)\t+([^\t]+)\t+([^\t]+)\t*(.*)$/) {
    push(@list, [ $1, $2, $3, $4, $5 ]);
    push(@names, $1) if (!$donename{$1}++);
    $names_to_real{$1} ||= $3;
    }
  }
close(OS);

This is a good place to start, because it only depends on one variable from outside the work, $oslist, which is the name of the OS definitions file. And, of course, file access is always a good candidate for abstraction…what if, some day, we want to pull these definitions from a database or a __DATA__ section? Having it all in one obvious location might be a win. For now, I just want that bloody long oschooser function to be a little bit shorter, so we’ll create this parse_patterns function:

sub parse_patterns() {
my ($oslist) = @_;
my @list;
my @names;
my %donename;
# Parse the patterns file
open(OS, $oslist) || die "failed to open $oslist : $!";
while(<OS>) {
  chop;
  if (/^([^\t]+)\t+([^\t]+)\t+([^\t]+)\t+([^\t]+)\t*(.*)$/) {
    push(@list, [ $1, $2, $3, $4, $5 ]);
    push(@names, $1) if (!$donename{$1}++);
    $NAMES_TO_REAL{$1} ||= $3;
    }
  }
close(OS);
return (\@list, \@names);
}

That’s not too bad, and it shaves about 13 lines off of oschooser at a cost of 3 or 4 more lines of function baggage in the whole file. The biggest irritant might be that I’m now passing around two array refs (one of which is already an array reference, so now we’ve got a reference to an array of references). I get confused when I use too many references, because I’m addle-brained that way, but these are only a little bit nested and so not too complicated, so I think future readers of the code should be fine. At least, no worse than they were before I got ahold of this script.

I’ve also converted %names_to_real to %NAMES_TO_REAL as it has become a package scoped global variable, and it’s considered good form to warn folks when they’ve come upon a global by shouting at them. Of course, I have another global, $uname, which I haven’t renamed to all caps, as one of my mandates for myself on this project is to require no changes to the Webmin os_list.txt. As I write this, I’m beginning to have second thoughts about $uname needing to be a global…so we’ll come back to that later.

Capturing the results of parse_patterns and dumping them out into @name and @list lets us run our tests again.

And More Refactoring Still

Things have improved a little in oschooser. It almost fits into two screenfuls on my 20″ monitor. But I think we can do better. I’m aiming for one page or less per function, in this exercise, so we’ve gotta keep moving. The next distinct piece of functionality I see is the automatic OS detection code, so I’ll add a new auto_detect function, something like this:

sub auto_detect() {
my ($oslist, $issue, $list_ref) = @_;
my $ver_ref;
my @list = @$list_ref;
my $uname = `uname -a`;
 
# Try to guess the OS name and version
my $etc_issue;
 
if ($issue) {
  $etc_issue = `cat $issue`;
  $uname = $etc_issue; # Strangely, I think this will work fine.
  }
elsif (-r "/etc/.issue") {
  $etc_issue = `cat /etc/.issue`;
  }
elsif (-r "/etc/issue") {
  $etc_issue = `cat /etc/issue`;
  }
 
foreach my $o (@list) {
  if ($issue && $o->[4]) {
    $o->[4] =~ s#cat [/a-zA-Z\-]*#cat $issue#g;
    } # Testable, but this regex substitution is dumb.XXX
  if ($o->[4] && eval "$o->[4]") {
    # Got a match! Resolve the versions
    $ver_ref = $o;
    if ($ver_ref->[1] =~ /\$/) {
      $ver_ref->[1] = eval "($o->[4]); $ver_ref->[1]";
      }
    if ($ver_ref->[3] =~ /\$/) {
      $ver_ref->[3] = eval "($o->[4]); $ver_ref->[3]";
      }
    last;
    }
  if ($@) {
    print STDERR "Error parsing $o->[4]\n";
    }
  }
  return $ver_ref;
}

You may note that I did rethink the globalization of $uname and found that it fit comfortably into this block, so I’ve killed a global introduced earlier in this process. Now I’ve not even sure why I thought I needed it somewhere else, which is a nice thing about refactoring: You realize how little you understood what was going on when you first looked at the code. Here’s also where we make use of the @list built during parse_patterns, and I dereference it before using it, though that’s probably more verbosity than needed. I could also access it directly within the ref with, @{$list_ref}.

Finally, I’m returning $ver_ref, which contains a reference to an array of fields that describes the operating system detected. Now that I’ve split this out, I realize that this OS version array could quite easily be mapped into a hash and turned into an object rather trivially, but that’s an exercise for another day. For now, I just want to feel confident that I’ve made a functionally identical clone of oschooser.pl that I can use and extend painlessly and without fear of breakage. So, let’s keep going.

A Few More New Functions, and Killing Unused Code Softly

As with auto_detect there is a big chunk of code that is used specifically for asking the user to choose the operating system and version from a list of options. This is triggered in the following cases: $auto is set to 0 or any other false value, $auto is not false but auto-detection failed and one of the non-exit auto options is chosen and viable. So, we can easily break out this whole bunch of functionality into its own function, called ask_user. Like auto_detect, it requires the $list_ref array reference, and it also needs the $names_ref, since it will be interacting with the end user and they’ll be more comfortable seeing the “real names” of the available operating systems. Also like auto_detect, it returns a $ver_ref which points to the array containing the full description of the OS.

When I got to this function, I noticed a huge block of unused code, which provides support for the dialog command on systems that support it (mostly just Red Hat based Linux distributions). dialog is a simple tool for adding attractive ncurses interfaces to shell scripts. I’m not sure why the code is being skipped with an if (0) statement, but I have only two choices for what to do about it, if my goal is to simplify this script and make it more robust: Enable it and fix whatever problems it has, possibly making it into its own reusable and independently testable function; or, simply remove the code altogether. Webmin and the installer libraries for Virtualmin are both in SVN. If I decide to remove the code, it won’t be lost forever…I could pull it back in the future. I could even tag the current version with “pre-dialog-removal” before stripping it out. After consulting with Jamie the last option is the one I’ve chosen. So, we can kill not just those pieces of code, we can also remove the has_command function, since it is only used in that part of the code. Big win!

So, I’ll make a tagged copy before ripping stuff out:

svn cp lib tags/lib/pre-dialog-removal

Now I know I can always go back and refer to that code if I want to. It’s not really particularly precious, but it’s a good practice to get into, since copies in Subversion are cheap and fast (likewise for git, and most other modern distributed revision control systems), and I never know when I might want to go back and see how something was done before. I’ll do the same in the Webmin tree before I make the changes needed to merge the new OsChooser.pm in place of the old oschooser.pl.

So, after killing the dialog pieces of the code, and converting the user interaction to its own function, we have:

# ask for the operating system name ourselves
sub ask_user {
my ($names_ref, $list_ref) = @_;
my @names = @$names_ref;
my @list = @$list_ref;
my $vnum;
my $osnum;
my $dashes = "-" x 75;
print <<EOF;
For Webmin to work properly, it needs to know which operating system
type and version you are running. Please select your system type by
entering the number next to it from the list below
$dashes
EOF
{
my $i;
for($i=0; $i<@names; $i++) {
  printf " %2d) %-20.20s ", $i+1, $names[$i];
  print "\n" if ($i%3 == 2);
  }
print "\n" if ($i%3);
}
print $dashes,"\n";
print "Operating system: ";
chop($osnum = <STDIN>);
if ($osnum !~ /^\d+$/) {
  print "ERROR: You must enter the number next to your operating\n";
  print "system, not its name or version number.\n\n";
  exit 9;
  }
if ($osnum < 1 || $osnum > @names) {
  print "ERROR: $osnum is not a valid operating system number.\n\n";
  exit 10;
  }
print "\n";
 
# Ask for the operating system version
my $name = $names[$osnum-1];
print <<EOF;
Please enter the version of $name you are running
EOF
print "Version: ";
chop($vnum = <STDIN>);
if ($vnum !~ /^\S+$/) {
  print "ERROR: An operating system number cannot contain\n\n";
  print "spaces. It must be like 2.1 or ES4.0.\n";
  exit 10;
  }
print "\n";
return [ $name, $vnum,
    $NAMES_TO_REAL{$name}, $vnum ];
}

Not too bad. It just fits into one 50 row editor window without scrolling, so that’s a small enough bite for me. We make use of the %NAMES_TO_REAL global in this function, to convert from the short names to the longer human-friendly names, and I’m beginning to get a vague feeling something could be done to encapsulate that functionality, even without making this an Object Oriented library (which seems like overkill for such a simple program), so I’ll probably be coming back to that global in a later post (and I thought I would have a hard time getting two full posts worth out of this exercise!).

Wrapping Up

I’m feeling pretty good about the code now. I think it’s more readable than before I started messing with it, it’s certainly shorter due to bits of refactoring and some removal of dead or redundant code, and it’s got quite a few tests. All of its variables are reasonably scoped to the areas where they are used, except for %NAMES_TO_REAL, which is a package scoped my variable (turns out eval gets the scope of the containing block, so it doesn’t need to be an our variable as I’d first assumed).

The various utility functions aren’t very useful to outsiders and may change…the only function I really want to be public is oschooser, so I can see several opportunities for further enhancements, like encapsulating the rest into private methods within an OsChooser object. But that’ll be a project for another day. You can see the current code, plus an example os_list.txt. Next time, I’ll begin work on wrapping this up for CPAN, and releasing a large OS definition list (Webmin’s list is incredibly long and detects hundreds of systems and versions, but needs a bit of massaging to be generically useful, due to its own internal version requirements).

Who knew one simple script could present so much interesting work? It’s been so much fun, I think I’ll start a Perl Neighborhood Watch and do this to every little Perl script I come across. Who’s with me!? (Or maybe I should just focus on our own code for a little while longer, since we’ve got quite a few nooks and crannies that haven’t seen any attention in years. Perl makes us lazy with its peskily perfect backward compatibility.)

Old School to New School: Refactoring Perl

At YAPC::NA I sat in on lots of great talks (I also won Randal Schwartz in the charity auction, and so got to be beaten soundly at pool by him, and learn a few things about Smalltalk and Seaside). In particular, Michael Schwern gave a fantastic talk entitled Skimmable Code: Fast to Read, Fast to Change‎. This got me thinking about our own code. Webmin is an old codebase, approaching 11 years old, and thus has some pretty old school Perl practices throughout. Coding standards sort of stick to projects over a few years, and as new code comes in, it tends to look like the old code. And, to add to that momentum, Jamie has religiously kept compatibility for module authors throughout the entire life of the project. Modules written ten years ago can, astonishingly, be expected to work identically in todays Webmin, though they might not participate in logging or advanced ACLs or other nifty features that have come to exist in the framework in that time.

So, when I found myself needing to make a modification to oschooser.pl, a small program for detecting the operating system on which Webmin is running (sounds trivial, but when you realize that Webmin runs on hundreds of operating systems and versions, it turns out to be a rather complex problem), I decided to take the opportunity to put into practice some of the niceties of modern Perl. This article is a little different than what I usually write for In the Box, in that it covers a lot of ground fast, and most of it is probably pretty mundane stuff for folks already writing modern Perl. But, I think there’s enough old Perl code running around out there, running the Internet and such, that it’s worth talking about modernization work.

So, let’s go spelunking!

Introduction to oschooser.pl

The code we’ll be picking apart, and putting back together, is probably one of the more heavily used pieces of Perl code, and certainly one of the oldest, in the wild. It’s the OS detection code that Webmin and Usermin use to figure out what system they’re running on during installation. With Webmin having 12 million (give or take several million) downloads over its ten year history, this equals a lot of operating systems successfully detected. Perhaps I should have picked something a little less important for my first stab at modernization, but I’ve rarely been accused of being smart about making sweeping changes! (Jamie will reel me in, before I break actual Webmin code. I manage to break Virtualmin every now and then…but he’s more suspicious when I check code into Webmin, since it happens quite rarely.)

The oschooser.pl program actually loads up a rather complex definitions file called os_list.txt (by default, though it’s configurable, and we use different lists for Virtualmin and Webmin, since they have different requirements for version identification). The definitions file can contain snippets of Perl code, which will be executed via eval, when appropriate. Most of the updates to OS detection over the years have happened in os_list.txt, so oschooser.pl hasn’t seen a lot of grooming over the years, which makes it a prime candidate for modernization. Assuming, of course, that it works identically when I’m done with it.

Where to start?

My end goal with this project is to make oschooser.pl usable as a library from Perl programs, since our new product installer is written in Perl rather than POSIX shell. I also figured it’d be nice to make it testable, since I’ve made several mistakes in the detection code (in os_list.txt, specifically) over the past few years that led to our product being uninstallable on some systems until the bug was tracked down. But, first things first. Almost nothing in Webmin is strict compatible, and even warnings can cause some complaints, so that seems like a good starting point.

The code we’re starting with can be found here, so you can follow along at home.

Enabling warnings reveals the following (don’t worry about the arguments for now):

$ perl -w oschooser.pl os_list.txt outfile 1
Name "main::uname" used only once: possible typo at oschooser.pl line 31.
Name "main::donename" used only once: possible typo at oschooser.pl line 17.

Not too bad, actually. Just a couple of variables that are only seen once, easy enough to fix by giving them a my declaration. Though, in this case, it looks like enabling warnings turns up some unused code. While donename is actually keeping track of what names we’ve seen, so far, and it’s one of several idiomatic ways to build an array of unique values, the uname variable seems to have no purpose. So I’m going to kill that whole line rather than declare it.

Next up in our “low-hanging fruit” exercise is enabling use strict. Turns out this is quite a lot more intimidating:

$ perl -c oschooser.pl
Global symbol "$oslist" requires explicit package name at oschooser.pl line 15.
Global symbol "$out" requires explicit package name at oschooser.pl line 15.
Global symbol "$auto" requires explicit package name at oschooser.pl line 15.
Global symbol "$oslist" requires explicit package name at oschooser.pl line 16.
Global symbol "$oslist" requires explicit package name at oschooser.pl line 16.
Global symbol "@list" requires explicit package name at oschooser.pl line 20.
Global symbol "@names" requires explicit package name at oschooser.pl line 21.
Global symbol "%names_to_real" requires explicit package name at oschooser.pl line 22.
Global symbol "$auto" requires explicit package name at oschooser.pl line 27.
Global symbol "$etc_issue" requires explicit package name at oschooser.pl line 30.
Global symbol "$etc_issue" requires explicit package name at oschooser.pl line 33.
Global symbol "$o" requires explicit package name at oschooser.pl line 36.
Global symbol "@list" requires explicit package name at oschooser.pl line 36.
Global symbol "$o" requires explicit package name at oschooser.pl line 37.
Global symbol "$o" requires explicit package name at oschooser.pl line 37.
Global symbol "$ver" requires explicit package name at oschooser.pl line 39.
Global symbol "$o" requires explicit package name at oschooser.pl line 39.
Global symbol "$ver" requires explicit package name at oschooser.pl line 40.
Global symbol "$ver" requires explicit package name at oschooser.pl line 41.
Global symbol "$o" requires explicit package name at oschooser.pl line 41.
Global symbol "$ver" requires explicit package name at oschooser.pl line 41.
Global symbol "$ver" requires explicit package name at oschooser.pl line 43.
Global symbol "$ver" requires explicit package name at oschooser.pl line 44.
Global symbol "$o" requires explicit package name at oschooser.pl line 44.
Global symbol "$ver" requires explicit package name at oschooser.pl line 44.
Global symbol "$o" requires explicit package name at oschooser.pl line 49.
Global symbol "$ver" requires explicit package name at oschooser.pl line 53.
Global symbol "$auto" requires explicit package name at oschooser.pl line 54.
Global symbol "$auto" requires explicit package name at oschooser.pl line 59.
Global symbol "$rv" requires explicit package name at oschooser.pl line 61.
Global symbol "$auto" requires explicit package name at oschooser.pl line 67.
Global symbol "$auto" requires explicit package name at oschooser.pl line 72.
Global symbol "$auto" requires explicit package name at oschooser.pl line 77.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 80.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "@names" requires explicit package name at oschooser.pl line 81.
Global symbol "$i" requires explicit package name at oschooser.pl line 81.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 82.
Global symbol "$i" requires explicit package name at oschooser.pl line 82.
Global symbol "@names" requires explicit package name at oschooser.pl line 82.
Global symbol "$i" requires explicit package name at oschooser.pl line 82.
Global symbol "$tmp_base" requires explicit package name at oschooser.pl line 84.
Global symbol "$temp" requires explicit package name at oschooser.pl line 85.
Global symbol "$tmp_base" requires explicit package name at oschooser.pl line 85.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 86.
Global symbol "$temp" requires explicit package name at oschooser.pl line 86.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 87.
Global symbol "$temp" requires explicit package name at oschooser.pl line 87.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 88.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 88.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 89.
Global symbol "$name" requires explicit package name at oschooser.pl line 96.
Global symbol "@names" requires explicit package name at oschooser.pl line 96.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 96.
Global symbol "@vers" requires explicit package name at oschooser.pl line 97.
Global symbol "$name" requires explicit package name at oschooser.pl line 97.
Global symbol "@list" requires explicit package name at oschooser.pl line 97.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 98.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "@vers" requires explicit package name at oschooser.pl line 99.
Global symbol "$i" requires explicit package name at oschooser.pl line 99.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 100.
Global symbol "$i" requires explicit package name at oschooser.pl line 100.
Global symbol "$name" requires explicit package name at oschooser.pl line 100.
Global symbol "@vers" requires explicit package name at oschooser.pl line 100.
Global symbol "$i" requires explicit package name at oschooser.pl line 100.
Global symbol "$cmd" requires explicit package name at oschooser.pl line 102.
Global symbol "$temp" requires explicit package name at oschooser.pl line 102.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 103.
Global symbol "$temp" requires explicit package name at oschooser.pl line 103.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 104.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 104.
Global symbol "$temp" requires explicit package name at oschooser.pl line 105.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 106.
Global symbol "$ver" requires explicit package name at oschooser.pl line 110.
Global symbol "@vers" requires explicit package name at oschooser.pl line 110.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 110.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 114.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 115.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "@names" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 121.
Global symbol "$i" requires explicit package name at oschooser.pl line 122.
Global symbol "@names" requires explicit package name at oschooser.pl line 122.
Global symbol "$i" requires explicit package name at oschooser.pl line 122.
Global symbol "$i" requires explicit package name at oschooser.pl line 123.
Global symbol "$i" requires explicit package name at oschooser.pl line 125.
Global symbol "$dashes" requires explicit package name at oschooser.pl line 126.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 128.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 129.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 134.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 134.
Global symbol "@names" requires explicit package name at oschooser.pl line 134.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 135.
Global symbol "$name" requires explicit package name at oschooser.pl line 141.
Global symbol "@names" requires explicit package name at oschooser.pl line 141.
Global symbol "$osnum" requires explicit package name at oschooser.pl line 141.
Global symbol "$name" requires explicit package name at oschooser.pl line 142.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 146.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 147.
Global symbol "$ver" requires explicit package name at oschooser.pl line 153.
Global symbol "$name" requires explicit package name at oschooser.pl line 153.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 153.
Global symbol "%names_to_real" requires explicit package name at oschooser.pl line 154.
Global symbol "$name" requires explicit package name at oschooser.pl line 154.
Global symbol "$vnum" requires explicit package name at oschooser.pl line 154.
Global symbol "$out" requires explicit package name at oschooser.pl line 159.
Global symbol "$ver" requires explicit package name at oschooser.pl line 160.
Global symbol "$ver" requires explicit package name at oschooser.pl line 161.
Global symbol "$ver" requires explicit package name at oschooser.pl line 162.
Global symbol "$ver" requires explicit package name at oschooser.pl line 163.
Global symbol "$d" requires explicit package name at oschooser.pl line 170.
Global symbol "$rv" requires explicit package name at oschooser.pl line 172.
Global symbol "$rv" requires explicit package name at oschooser.pl line 174.
Global symbol "$d" requires explicit package name at oschooser.pl line 177.
Global symbol "$d" requires explicit package name at oschooser.pl line 178.
Global symbol "$rv" requires explicit package name at oschooser.pl line 178.
Global symbol "$d" requires explicit package name at oschooser.pl line 178.
Global symbol "$rv" requires explicit package name at oschooser.pl line 181.
oschooser.pl had compilation errors.

Wow! I think that might be more lines than the program itself. Luckily, it’s almost entirely unscoped variables. A quick pass over the code, adding my declarations to the obvious candidates, gets things looking a little better. One tricky bit is the $i loop variables used in for loops. We don’t want those to be declared several times in the code, and we don’t want them to leak out into the scope of the rest of the program. In modern Perl, this is no problem, as you can use the following:

    for(my $i=0; $i<@names; $i++) {
      $cmd .= " ".($i+1)." '$names[$i]'";
      }

And $i will be local to the for loop. I momentarily feared that I’d need to use an outer block to accomplish this, as Webmin needs to be compatible with quite old Perl versions (5.005, for core Webmin, unless Unicode support is needed, in which case 5.8.1 is required), but after downloading and installing Perl 5.005_4, I found that was an unnecessary precaution. The foreach loops can also make use of this convenient feature. If you do happen to be stuck with an even more ancient version that 5.005 (but still higher than 4)–though I can’t imagine how you could, as 5.005 is over nine years old–you can use the following:

  {
  my $i;
    for($i=0; $i<@names; $i++) {
      $cmd .= " ".($i+1)." '$names[$i]'";
      }
  }

Which provides similar private scope for the $i variable, at the cost of three extra lines.

Making it Testable

So far, I haven’t made any changes that are likely to break the code. It’s merely been cleanup and syntax tweaks. But, to accomplish everything I’d like in this exercise, we’ll be doing some refactoring and refining the code. To do that with confidence, it’d be nice to have some tests to insure the code works the same before and after any changes.

Since this is not historically a library, it’s not particularly easy to test. One could write a custom test harness, or use Test::Command, and test its behavior as a whole, but since it’s written in Perl and one of my goals is to make it useful as a library from Perl scripts, I decided instead to make it loadable as a module and use Test::More. A trick that’s very common in the Python world, but doesn’t seem as well-known amongst Perlmongers is a main function which is called if the script is executed independently rather than via use or require. The main function then calls whatever the script would normally do, optionally setting up variables or parsing command line arguments.

So, I added the following near the beginning of the file:

# main
sub main() {
if ($#ARGV < 1) { die "Usage: $0 os_list.txt outfile [0|1|2|3]\n"; }
my ($oslist, $out, $auto) = @ARGV;
oschooser($oslist, $out, $auto);
}
main() unless caller();  # make it testable and usable as a library

I also took this opportunity to add a simple usage message if the command is executed with fewer than two arguments (@ARGV, like all Perl arrays starts counting at 0). I also needed to wrap the main function of the script in a sub block, so that the script doesn’t do anything immediately if loaded as a library.

Make it a Module

Since I want to use this code as a library, I face a choice. The use statement is functionally equivalent to:

BEGIN { require Module; Module->import( LIST ); }

Which means, I suppose, I could keep the name oschooser.pl and use:

require 'oschooser.pl';

We don’t need BEGIN level assurance, since we have no prototypes in this library and only use simple subroutines. But, I find this a bit unsatisfying, since it’s no longer in common use amongst Perl developers, and use provides the ability to export functions explicitly. Test::More has both a use_ok and a require_ok function, so it’s irrelevant from a testing perspective. It’ll probably remain oschooser.pl in Webmin proper, and OsChooser.pm in my Virtualmin installer library, at least for the foreseeable future. Not really a lot of difference between the two.

Some Tests

So, now that we can call the library roughly the way we want, using use, it’s time to write a few tests to be sure things actually work after we begin making more sweeping changes.

We can start with simple compile tests (I usually call these types of tests t/return.t, as they just check to be sure the module returns without error on load and the functions within return the data type that is expected):

#!/usr/bin/perl -w
# These tests just check to be sure all functions return something
# It doesn't care what it is returned...so garbage can still pass,
# as long as the garbage is the right data type.
 
use strict;
use Test::More qw(no_plan);
 
use_ok( 'OsChooser' );
 
isa_ok(\OsChooser::have_tty(), 'SCALAR');
isa_ok(\OsChooser::has_command("cp"), 'SCALAR');

Hmm…OK, so we don’t actually have a lot to test yet, just a couple of utility functions (and I’ve even cheated a little and looked ahead to where I introduced a have_tty function, or this would be an even shorter set of tests). The most important function, oschooser, doesn’t know how to return anything very useful yet. It can only write out its findings to a file. But, since we’re always going to be creating that file, regardless of how nice the module usage becomes, we need to figure out how to test it anyway.

Unsurprisingly, there is already a full-featured module on CPAN for testing the contents of files, called, unlikely though it may seem, Test::Files. So, we’ll just grab that:

$ sudo perl -MCPAN -e shell
 
cpan shell -- CPAN exploration and modules installation (v1.7602)
ReadLine support enabled
 
cpan> install Test::Files
...

And then create as many Operating System definition files as we want in the t directory. We’ll just name them for the OS they represent. This is the kind of testing I love, because the actual test file will be extremely simple, no matter how many operating systems I want to test on:

#!/usr/bin/perl -w
use strict;
use OsChooser;
 
# Get a list of the example OS definition files
opendir(DIR, "t/") || die "can't opendir t/ $!";
my @files = grep { /\.os/ } readdir(DIR);
closedir DIR;
use Test::More qw(no_plan);
use Test::Files;
 
foreach my $file (@files) {
  $file =~ /(.*)\.os$/;
  my $osname = $1;
  my $outfile = "t/outfile";
  OsChooser::oschooser("os_list.txt", $outfile, 1);
  compare_ok("t/$file", $outfile, $osname);
 
  # Cleanup
  unlink $outfile;
}

I love data-driven software, and this is a fun little example of it. We can run as many tests as we want, merely by adding more OS data files–one with the “os” suffix to provide what should be output by oschooser and one to contain the file that oschooser would normally use to identify the OS (/etc/issue, among others), which isn’t yet supported, but I’ll talk about it in the next post. Speaking of being data-driven, I think it’d also be pretty nifty to get the test count from the @files array, rather than using no_plan, but because modules loaded with use are loaded early during compile time (in a BEGIN block, effectively) we don’t actually have anything in @files yet.

However, as mentioned, the oschooser function doesn’t yet allow one to specify the issue file to look at, so no matter how many definitions I provide, it’ll never be able to test anything but the OS the test is running on. Oh, well, for now we’ll just create one OS definition file that matches my current OS, and make it a priority to make the function more testable somehow, possibly via an optional parameter to oschooser.

Alright, so now that we have some rudimentary tests in place, we can break stuff with confidence! We’ll come back to testing again in the near future, since we’re leaving so much untested right now.

Plain Old Documentation

I’m going to take a quick detour now that we’ve got some basic tests in place. Testing is one practice that most developers agree makes for great code, and the other practice that most folks can agree on is documentation.

Since this is such a simple piece of code, and was intended exclusively for use during installation of Webmin and Usermin, Jamie never really documented it. Now that I’m forcing it to be useful in other locations, and having some fun giving it a modern Perl face lift, it’s as good a time as any to add some documentation. POD isn’t the only documentation format usable within Perl code, but it is, by far, the most popular, and it has lots of great tools for processing and testing coverage, so that’s what Jamie recently chose for use in documenting the Virtualmin API. It’s also easy to learn, and results in text that is pretty readable even before processing.

I’m not sure of the recommended practices for documenting scripts that work on both the command line and as a module, but here’s what I came up with:

=head1 OsChooser.pm
 
Attempt to detect operating system and version, or ask the user to select
from a list.  Works from the command line, for usage from shell scripts,
or as a library for use within Perl scripts.
 
=head2 COMMAND LINE USE
 
OsChooser.pm os_list.txt outfile [auto]
 
Where "auto" can be the following values:
 
=over 4
 
=item 0
 
always ask user
 
=item 1
 
automatic, give up if fails
 
=item 2
 
automatic, ask user if fails
 
=item 3
 
automatic, ask user if fails and if a TTY
 
=back
 
=head2 SYNOPSIS
 
    use OsChooser;
    my ($os_type, $version, $real_os_type, $real_os_version) =
       OsChooser->oschooser("os_list.txt", "outfile", $auto, [$issue]);
 
=cut

Pretty simple, but covers the basics.

Next Time

Unfortunately, the code is now longer and probably a little less readable than before! It’s probably more robust to changes, since it now has reasonably scoped variables. And it’s more friendly to others who might want to use it, due to the new documentation and the ability to use it as a library in Perl or as a command in shell scripts.

Next time we’ll start in on the refactoring, and we’ll also write some more tests. This is turning into a real challenge, due to the data-driven nature of the script, and the fact that it’s somewhat hardcoded to look for OS data in very specific locations. Since, a big part of what I want to test is in the os_list.txt file, we don’t have the luxury of just saying, “It’s configuration…we’ll just make a special version for testing purposes.” We’ll have to get far more clever.

Extending Virtualmin with plugins

Plugins can be big

Not many people know that Virtualmin’s already extensive list of built-in features can be extended by writing plugins, which are basically Webmin modules that export a special API. Why would you want to do this, you may ask? Let’s say their is a mailing list application, log analyzer, database or source code control system that you want to make available on a per-domain basis .. if so, a plugin is the way to do it.

A plugin is typically used to a new feature to Virtualmin. In it’s parlance, a feature is something that is enabled on a per-domain basis, such as a website, DNS domain or MySQL database. Let’s say you have discovered an awesome new log analysis program that you want run on each domain’s log files – a plugin would be the way to implement it.

A plugin can also add options to mailbox users. The most common use of this is to grant access on a per-user basis to some resource, such as statistics, an application or database. Plugins can also create new database types, add links to the left menu in the Virtualmin framed theme, and add sections to it’s system information page.

Some of the existing plugins give you an idea of what’s possible :

  1. The DAV plugin adds a feature which makes a virtual server’s web pages editable from applications that support the protocol, such as Windows and OSX. It also lets you enable DAV logins for each mailbox in the domain.
  2. The Bootup Actions plugin allows domain owners to have their long-running server processes started when the system boots.
  3. The Mail Relay plugin lets you forward email for a domain to another server, which can be configured by the domain owner.
  4. The Admin Notes feature adds a new section to the right-hand frame for entering comments about the system, for sharing status between master admins.

To see a full list of plugins that exist, check out the third-party modules database.

If you know Perl, have written a regular Webmin module, and want to write your own plugin, check out the extensive documentation on the API.

Webmin::API: Using Webmin as a library

Webmin is perhaps the largest bundle of system administration related Perl code in existence (outside of CPAN, of course), much of which is unavailable anywhere else.  I often find myself wishing for a function or two from Webmin in my day-to-day Perl scripting.  Historically, one could use Webmin functions by first pulling in all of the bits and pieces manually, and running a few of the helper functions.  For example, at Virtualmin, Inc. we use this bit of code to start up the configuration stage of our install scripts:

#!/usr/bin/perl
$|=1;
# Setup Webmin environment
$no_acl_check++;
$ENV{'WEBMIN_CONFIG'} ||= "/etc/webmin";
$ENV{'WEBMIN_VAR'} ||= "/var/webmin";
$ENV{'MINISERV_CONFIG'} = $ENV{'WEBMIN_CONFIG'}."/miniserv.conf";
open(CONF, "$ENV{'WEBMIN_CONFIG'}/miniserv.conf") || die "Failed to open miniserv.conf";
while(<CONF>) {
  if (/^root=(.*)/) {
    $root = $1;
    }
  }
close(CONF);
$root ||= "/usr/libexec/webmin";
chdir($root);
require './web-lib.pl';
init_config();

Wow.  That’s a lot of extraneous crap just to make use of Webmin functions.  Not all of that is necessary in every script that wants to use Webmin functions, but it’s always something I have to refer to the documentation for.

So, I’ve been bugging Jamie for some time to make a simpler way to get at the Webmin API, and he’s just released the Webmin::API Perl module.   To use it, you’ll first need Webmin installed.  There’s an RPM, deb, tarball, and Solaris pkg, so it’s easy no matter what UNIX-like OS you run (it’ll also run on Windows, but only in relatively limited fashion), and then you can install it like any other Perl module:

# tar xvzf Webmin-API-1.0.tar.gz
# cd Webmin-API
# perl Makefile.PL
# make install

Once that’s done, you can make use of the entirety of the web-lib.pl, plus the libraries for all of the Webmin modules.  For example, one could access all of the Webmin variables, like %gconfig, as well as all of the web-lib.pl functions, such as ftp_download (pure Perl FTP client), kill_byname (like killall), nice_size (return a number in GB, MB, etc.), running_in_zone (detects whether it's running in a Solaris Zone), etc.

So, making an application that downloads and does something with remote files is trivial, for example.  But, probably more interesting, is that once Webmin::API has been loaded, you can make use of the foreign_require function, which is used to access any available Webmin module function library.

For example, if I wanted to make sure Postfix was configured to use Maildir mail spools, I could do the following:

foreign_require("postfix", "postfix-lib.pl");
postfix::set_current_value("home_mailbox", "Maildir/", 1);
postfix::reload_postfix();

That’s it.  No need to worry about parsing the file and no regex needed.  You don’t need to figure out where the Postfix main.cf is located (assuming Webmin is configured correctly), or what the proper way to restart the service is.

One common, and surprisingly complicated, task is setting up initscripts to start on boot.  It seems like every Linux distribution uses a slightly different directory layout, slightly different scripts, and different tools for managing the rc directories and files.  Webmin knows about the vast majority of those quirks, and provides a uniform interface to all of them, and this functionality is exposed to scripts via the init module.  For example, I could enable Postfix on boot with the following:

foreign_require("init", "init-lib.pl");
init::enable_at_boot("postfix");

There is one unfortunate caveat to this: You have to know the name of the initscript.  On all of the systems I work with, this is pretty consistent across most services, with the exception of Apache.  One Red Hat based systems the Apache services is called httpd, while on Debian/Ubuntu systems it is apache2.  Some systems also call it apache.

Working With the Linux Firewall

One of the most powerful Webmin modules is the Linux Firewall module, which manages an iptables firewall.  It is nearly comprehensive, covering many of the advanced stateful capabilities, as well as logging and creation of and management of arbitrary chains.  We can make use of the basic functionality of the module by importing the firewall library.

foreign_require("firewall", "firewall-lib.pl");

Once imported, we have access to the get_iptables_save function, which imports any existing rules from the system default iptables save file into an array.  You can then work with them using standard Perl data management tools like push and splice.

Say you want to open ports 10000 and 20000 (for Webmin and Usermin, of course).  Maybe you also want to make sure ssh (port 22) is available for those times when you need to hit the command line.  The simplest is probably to drop them into an array (so you can add new ports later without having to read code):

#!/usr/bin/perl
 
use Webmin::API;
foreign_require("firewall", "firewall-lib.pl");
use warnings;
 
my @tcpports = qw(ssh 10000 20000);
my @tables = &amp;firewall::get_iptables_save();
(my $filter) = grep { $_->{'name'} eq 'filter' } @tables;
if (!$filter) {
  my $filter = { 'name' => 'filter',
              'rules' => [ ] };
}
 
foreach ( @tcpports ) {
  print "  Allowing traffic on TCP port: $_\n";
  my $newrule = { 'chain' => 'INPUT',
               'p' => [ [ '', 'tcp' ] ],
               'dport' => [ [ '', $_ ] ],
               'j' => [ [ '', 'ACCEPT' ] ],
             };
  splice(@{$filter->{'rules'}}, 0, 0, $newrule);
}
firewall::save_table($filter);
firewall::apply_configuration();

This reads the existing rules, and adds new ones, saves it out, and applies the new rules. The rules that this creates are identical to what you would get if you’d entered the following on the command line on a Red Hat based system:

iptables -I INPUT -p tcp --dport ssh -j ACCEPT
iptables -I INPUT -p tcp --dport 10000 -j ACCEPT
iptables -I INPUT -p tcp --dport 20000 -j ACCEPT
service iptables save

Now, of course you could do all of that with backticks and subsitution, but you’d have to add a bunch of additional logic to figure out whether to use iptables-save, service iptables save, or some variant of the former with an option or two (Debian and Ubuntu have a rather complex set of firewall configuration files, and thus the appropriate iptables save file may not be immediately obvious).  And, dealing with things programmatically is more difficult, if you want to do something interesting like “only add a rule if these two other rules already exist, otherwise add the following two rules”.   And, reading and parsing the rather complex save file and writing it back out yourself can be a challenge (feel free to steal the Webmin code for it, if you prefer not to need all of Webmin).

Known Issues

This Perl module is new, so it’s pretty safe to say there is room for improvement.  The biggest is that only the core Webmin web-lib.pl and ui-lib.pl functions are documented, and thus the vast majority of functionality found in Webmin you’ll have to parse out from the relevant modules yourself.  I plan to spend some time adding POD documentation to each of those libraries in the not too distant future, but in the meantime, the best documentation is the source itself.  Luckily, every library has an accompanying working example application in the form of the module that it is part of.

Another issue is that Webmin is full of old code.  It’s a ten year old codebase…and much of it isn’t “use strict” or even “use warnings” compliant.   You can, of course, trigger warnings after “use Webmin::API” and it works fine.  See my final iptables example for that kind of usage.  Strict is only usable, even after the import of Webmin, if you disable many types of check.  This is another issue I’ll spend some time on in the future.

In the meantime, there’s a lot of great functionality that’s just been made a little easier to make use of.  I’ll be writing several more articles with examples of using this API in the near future.  Specifically, the next installment of my series on Analysis and Reporting of System Data  will make use of the Webmin System and Server Status module to build a flexible ping monitoring and reporting tool in just a few lines of code.

Sharing JavaScript Code in Webmin

I posted a while back on my personal blog about some UI enhancement work that I’ve been doing in Webmin using the ExtJS JavaScript toolkit. Several folks had questions about whether Webmin was getting a new “official” JavaScript toolkit (it has some ancient and ugly API calls to generate a few JavaScript helpers for things like field graying and validation and such, but they aint got that AJAX religion), and, if not, how one could add a JavaScript library to Webmin to cleanly share it across modules and themes.

So, the answer to the first question is that Webmin is not getting an “official” JavaScript toolkit at this time. Webmin has as one of its core goals that it can be used by anyone anywhere with any browser. AJAX and heavy JavaScript usage makes that goal far more complicated. For example, we consider it a serious bug if a blind user using a screen reader can’t use Webmin. That said, we also recognize that AJAX is the best way to handle huge classes of user interaction problems, and with our commercial offering we have a strong interest in having the best looking, and most pleasant to use, UI in the field. So, I’ve begun to build a “semi-official” Webmin module that contains ExtJS and some helper functions and classes. The first example usage of this will be our new TheJAX Virtualmin theme, and soon after a few new modules.

For the second question, I’d just like to show how I’ve created this new ExtJS module for Webmin, and how one can use it. It only takes a few minutes to wrap something up into a module, and since most AJAX frameworks are making use of good JavaScript design practices and using their own namespaces, you can actually mix and match without too much pain.

Hidden Modules

So, Webmin has a very powerful module system, that allows you to package code for easy distribution and installation. A Webmin module is simply a directory with some files in it. Only one file is mandatory to make the directory into a “module”: module.info

So, we create a directory named extjs within the Webmin directory (/usr/libexec/webmin on my system), and make a file called module.info with the following contents:

name=ExtJS
desc=ExtJS AJAX Toolkit
depends=1.360
version=0.1
hidden=1

Here I’ve given it a name, and a short description, noted the version of Webmin it depends on, given it a version (I’m going to stat it at 0.1, though the contained ExtJS version is 2.0b), and set it to be hidden. The hidden option means that users won’t be able to see this module in the UI, but other modules can make calls to it. Later, if I decide to add configurable options to this library that I do want users to be able to see, I can make it visible and add an icon and a UI.

Now, I can start dropping in my files. I merely unzipped the ExtJS bundle, deleted the extraneous files, and dropped it into an ext directory within the module directory. That’s just to make it easy to update ExtJS components separately from the helper functions that I write in Perl in the top-level directory.

Helper Functions

So, the simplest thing to automate away is the inclusion of the script tags that load the library. So, I’ll create a header_text function in a file called extjs-lib.pl (Webmin has a convention of calling function libraries modulename-lib.pl), which looks like this:

 # extjs-lib.pl
 
do '../web-lib.pl';
&amp;init_config();
 
my $debug=''; # Set to '-debug' to use non-stripped library
 
# header_text()
# Text to load JavaScript and CSS for use of extjs
sub header_text {
  return <EOF;
<script src="/extjs/ext/adapter/ext/ext-base.js" type="text/javascript"></script>
<script src="/extjs/ext/ext-all.js" type="text/javascript"></script>
<link href="/extjs/ext/resources/css/ext-all.css" rel="stylesheet" type="text/css" />
<link href="/extjs/ext/resources/css/xtheme-$config%7B" rel="stylesheet" type="text/css" />
EOF
}
 
1;

Here we pull in the Webmin core library, pull in the configuration for this module (which I’ll cover in a couple of days when I’ve completed the configuration code for this module), and build the function to return the bits of text we need to properly load ExtJS and its stylesheets.

Using It

Believe it or not, we’ve now got a library that can be used by other Webmin modules or by themes. Webmin has a foreign_require function that will pull libraries like this in under their own namespace. So, when I need to use ExtJS, I can do this:

foreign_require("extjs", "extjs-lib.pl");
print extjs::header_text();

All done! In a few days I’ll be finished with the first full-featured version of this library, and will wrap it up for distribution, along with some proof-of-concept modules that show how to use a full-featured AJAX interface without breaking text-mode browsers and readers, among other things.

One config file to rule them all

Configuration files are a boring necessity in software development. Parsing existing configuration files is a necessary aspect of almost any systems automation task. I regularly need to read and write configuration files from different languages, as I have simple maintenance, startup, and installation scripts written in BASH, larger Webmin-related tools in Perl, and stuff related to our website written in PHP. Of course, there are some great configuration file parsers for Perl in CPAN, but if you need a highly portable script and you don’t want your user to have to know anything about CPAN, it makes sense to build your own.

Luckily, in all three of these languages, plus Ruby and Python (other favorites of mine), simple configuration files can be easy, if you choose the right format.

Start from the Least Common Denominator

The least capable language in this story, at least with regard to data structures, is probably BASH, so we’ll start by creating a configuration file that’s easy to use with BASH. The obvious choice is a file filled with simple variable assignments, like so:

apache.config

# A comment
show_order=0
start_cmd=/etc/rc.d/init.d/httpd start
mime_types=/etc/mime.types
apachectl_path=/usr/sbin/apachectl
stop_cmd=/etc/rc.d/init.d/httpd stop
emptyvalue=
# A blank line too..
 
max_servers=100
test_config=1
apply_cmd=/etc/rc.d/init.d/httpd restart
httpd_path=/usr/sbin/httpd
httpd_dir=/etc/httpd
#  A comment with an=sign

This file is valid BASH syntax–you could run this directly with /bin/sh apache.config and it would return no errors (though it wouldn’t do anything, because the values are not exported, so they are only in scope for the split second it takes BASH to parse the file. Because it’s BASH syntax, empty lines are ignored, and any line that starts with a # is a comment and also ignored. Empty values are also legal, so we need to accommodate lines that have only a key and no value. Also because this is a valid BASH script, we can make use of these variables in our scripts easily by sourcing this file. In shell scripts this is done using the dot operator ( . ), like so:

. apache.config

After this, each of the values in the apache.config file are accessible by their names. There are some caveats that make this a less than ideal practice for anything more complicated than a small script. The variables pollute the namespace when pulled in this way. So, if you later wanted to use $apachectl_path as a variable for some other purpose, for example, you would overwrite the existing assignment, and cause possibly difficult to diagnose errors. BASH doesn’t have support for complex data structures, so there isn’t much we can do about this, without introducing quite a lot of complexity, so we’ll take our chances and keep our scripts short and simple.

Getting the values into a Perl data structure

While our configuration file is not valid Perl syntax, Perl still has plenty of tools for working with this kind of file. After all, Perl was born to pick up the ball where shell scripts fumbled (and eventually evolved into a hodge podge of every great, and some not so great, ideas in programming languages from the past couple of decades), so it’s natural that it would have the ability to do the same sorts of things as a shell script.

But, since our configuration file is not valid Perl syntax, we can’t simply call do apache.config; as we would to import another Perl script. We’ll have to parse it into a data structure (which is better programming practice, anyway, as mentioned above). One way to do this would be a while loop, like so:

my $file = "apache.config";
my %config;
open(CONFIG, "&lt; $file") or die "can't open $file: $!";
while () {
    chomp;
    s/#.*//; # Remove comments
    s/^\s+//; # Remove opening whitespace
    s/\s+$//;  # Remove closing whitespace
    next unless length;
    my ($key, $value) = split(/\s*=\s*/, $_, 2);
    $config{$key} = $value;
}
 
# Print it out
use Data::Dumper;
print Dumper(\%config);

Now, we can access the values in our configuration file from the %config hash, such as $config{‘apachectl_path’}. Another option, if you’re feeling particularly idiomatic, is to use map:

my $file = "apache.config";
open(CONFIG, "\&lt; $file") or die "can't open $file: $!";
my %config = map {
      s/#.*//; # Remove comments
      s/^\s+//; # Remove opening whitespace
      s/\s+$//;  # Remove closing whitespace
      m/(.*?)=(.*)/; }
      ;
 
# Print it out
use Data::Dumper;
print Dumper(\%config);

So, what’s the benefit to this latter example? Nothing major, it’s just another way to approach the problem. It’s a couple of lines shorter, but more importantly it has fewer temporary variables, which can be a source of errors in large programs. The multiple substitution regular expressions I’ve shown above in either example could be reduced to a single line, but I believe this is more readable, and according to the Perl documentation breaking the tests out into single tests is faster than having multiple possible tests in a single substitution. Some folks also find long regular expressions difficult to scan.

But, I only like Ruby!

OK, so you want to do it in Ruby. Ruby has a lot in common with Perl, so it’s actually pretty similar, though a bit more verbose. Ruby fans seem to discourage regular expressions, though it is a core part of the language and it has roughly the same regex capabilities as Perl, so I’ve only used one (I guess I could have gotten rid of it somehow…but I got tired of searching for the non-regex answer and punted):

config = {}
 
File.foreach("apache.config") do |line|
  line.strip!
  # Skip comments and whitespace
  if (line[0] != ?# and line =~ /\S/ )
    i = line.index('=')
    if (i)
      config[line[0..i - 1].strip] = line[i + 1..-1].strip
    else
      config[line] = ''
    end
  end
end
 
# Print it out
config.each do |key, value|
  print key + " = " + value
  print "\n"
end

Same end result as the Perl versions above: A config hash containing all of the elements in our configuration file.

What about those web applications written in PHP?

Two of the websites I maintain (Virtualmin.com, and this site) are written in PHP. One is a Joomla application with numerous extensions and custom modules and components, the other is a mildly customized WordPress site. In the case of Virtualmin.com, we’re developing a number of applications that have both Perl components for the back end work and PHP components for the web front end, so sharing configuration files can be useful. Webmin, conveniently enough, already uses shell variable key=value style configuration files, so everything we do is already in this format.

So, let’s see about getting these configuration files into a PHP data structure. PHP isn’t quite as rich as Perl in its data manipulation capabilities, but it did inherit quite a few of the same tools from Perl, so our solution in PHP looks pretty similar to the while loop version above, though it is a bit more verbose due to the keyword heavy nature of PHP (Perl is often accused of having too much syntax, and PHP has way too many keywords):

$file="apache.config";
$lines = file($file);
$config = array();
 
foreach ($lines as $line_num=&gt;$line) {
  # Comment?
  if ( ! preg_match("/#.*/", $line) ) {
    # Contains non-whitespace?
    if ( preg_match("/\S/", $line) ) {
      list( $key, $value ) = explode( "=", trim( $line ), 2);
      $config[$key] = $value;
    }
  }
}
 
// Print it out
print_r($config);

Hey, what about snake handlers?

Of course, it can also be done in Python. As with the Ruby implementation, I’m not certain this is the best way to do it, but it works on my test file.

import sys
config = {}
 
file_name = "apache.config"
config_file = open(file_name, 'r')
for line in config_file:
    # Get rid of \n
    line = line.rstrip()
    # Empty?
    if not line:
        continue
    # Comment?
    if line.startswith("#"):
        continue
    (name, value) = line.split("=")
    name = name.strip()
    config[name] = value
 
print config

Or, as dysmas suggested on Reddit, a more idiomatic version would be:

config = {}
 
file_name = "apache.config"
config_file= open(file_name)
 
for line in config_file:
    line = line.strip()
    if line and line[0] is not "#" and line[-1] is not "=":
        var,val = line.rsplit("=",1)
        config[var.strip()] = val.strip()
 
print config

So, now we’ve got a config associative array filled with all of our values in all of our favorite languages (except BASH, which gets straight variables). Assuming we use a common file locking mechanism, or always open them read only, we could even begin to use the same configuration files across our BASH, Perl, Ruby, Python, and PHP scripts independently but simultaneously.

What’s the point?

This isn’t just an academic exercise. The simple examples above make up the early start of a cross-language set of tools for systems management.

With these simple parsers, we can build tools that use the best language for the job, while still leveraging some interesting knowledge contained in Webmin’s configuration files (which are in this key=value format). Webmin supports dozens of Operating Systems and hundreds of services and configuration files, so the config files in a Webmin installation (usually found in /etc/webmin) contain a huge array of compatibility information that would take ages to gather. If you need to know how to stop or start Apache on Debian 4.0, or on Solaris, or on Red Hat Enterprise Linux, you’d have to check an installation of those systems or search the web or ask someone who has one of those systems handy. Or, you could check the Webmin configuration file, and get the same data for all of the Operating Systems Webmin supports. It’s a pretty valuable pile of data. Imagine writing a script for your own favorite OS, and then being able to hand it to anyone that happens to have Webmin installed, regardless of their OS and version. Or, if they don’t have Webmin installed, you could provide a template configuration file that they could fix for their OS and version, addressing both situations as simply as possible.

Not the only configuration file format

Of course, this isn’t the only configuration file format out there, or even the best. Python users really like INI files, and I can’t argue with them. When I was writing Perl and Python predominantly, I used the Config::INI::Simple module from CPAN and ConfigParser for Python so I could share configuration between my various software easily (I was generally writing a Webmin front end in Perl to a Python back end application). That worked great. So, I’m not arguing you ought to be using key=value configuration files for everything. But being able to read them makes a lot of portability data available to you for free.

Next time I’ll wrap a couple of these routines up into friendly libraries for easy use, and add some tests to be sure we’re doing what we think we’re doing.

Analysis and Reporting of System Data Part 1

There are a few basic elements to maintaining and administering systems: configuration, software management, data integrity and availability, and monitoring and reporting. This article introduces a number of tools for the last of those components, as well as presents some simple ways to create custom tools to report on data specific to your environment. There are dozens of great Open Source tools for gathering and presenting data, and so this series merely scratches the surface, but it provides a good introduction to some of the major system data analysis problems and presents some solutions.

Before trouble starts

Who, What, When, Where, Why and How

The six W’s (yeah, I’m not sure why “how” is one of the Ws, either) of reporting also apply to systems data. You want to know:

Who has been interacting with your server and services.

What they did.

When they did it, so you can determine if something they did is related to problems on the system.

Where they were coming from, just in case they aren’t who they claim to be.

Why? OK, so systems data probably can’t tell you why someone did something. You’ll have to ask them. But, with the right tools you’ll know who to ask and what to ask them, if anything funny does happen on your systems.

And, how any problems came about, so you can prevent them in the future. In short, the goal of all of this analysis and reporting on systems data is to keep your sysadmin house in order.

Oops.  Something went wrong.

The Basics

In the spirit of starting from first principles, we’ll begin this little exercise with the rudimentary tools that every system administrator ought to know a bit about: grep and tail.

While there are lots of automatic tools that provide graphs and charts and doohickeys that you can click or drag or hover over for hours of fun, odds are very good that some day, you’ll need to find out something very specific about a service on your system. Do you really want to schlep all over the Internet looking for just the right log analysis tool to find out whether that important message your boss sent to your companies biggest client was actually delivered? Of course not! Your boss is breathing down your neck right now. This is a job for grep!

grep is a search tool. It finds lines in a text file that match a regular expression1 and prints it to STDOUT. Like all UNIX command line tools, it can easily be combined with other tools for maximum awesomeness. So, let’s see grep in action, eh?

Find the boss’ email to badass@superhappymegacorp.com. Your boss (wimpy@thefacelesscorp.com) sent it out yesterday and he still hasn’t gotten a reply!

grep "to=<badass@superhappymegacorp.com>" /var/log/maillog</badass@superhappymegacorp.com>

Assuming your boss actually sent the message, this will print out something along the lines of:

Sep 24 23:04:52 www postfix/smtp[3208]: 93498290E97: to=, relay=none, delay=42281, 
status=deferred (connect to mail.superhappymegacorp.com[192.168.1.100]: Connection timed out)

Aha! The superhappymegacorp.com mail server isn’t responding. The message didn’t go through yet, but it’s not our fault! Ass covered. Rest easy and reward yourself with another one of those delicious cupcakes that cute secretary brought in this morning.

Just when you begin to think the rest of your day is going to be easy, in comes the web designer. She’s thoroughly in a panic because one of her off-shore contractors got the syntax wrong in an .htaccess file and exposed a directory filled with sensitive files. It’s now been fixed, but she needs to know if anyone outside of the company accessed those files during the couple of days while they were exposed. Hmmm…sounds like another job for grep. But, we need to find entries that don’t match a particular pattern. We’ll use the “-v” option to negate the pattern.

grep -v ^192\.168\.1\. /var/log/httpd/access_log

This assumes 192.168.1. matches our local company subnet. The “^” indicates that the pattern should appear at the beginning of a line, which in the Apache common log format is where the client IP appears. Because grep uses regular expressions, and the period “.” has special meaning (it means “match any single character”), I’ve used a backslash “\” to escape the periods in the IP. It would match anyway, because a period matches “any single character”, but it could lead to false positives (or negatives in this case) because 192.168.100.1 would match even though it isn’t in the 192.168.1.0/24 network.

Next up, tail, a nifty little tool that I use many times every day. In its simplest form it simply displays the last 10 to 20 lines of a file. Because log files on a UNIX system always append new entries to the end of the file, this will always show the most recent items in the log. It’s very useful for interactively debugging problems.

Even better, modern tail implementations include the “-f”, or “–follow”, option, which prints the log entries as they are added. So, if I were debugging a particularly ornery mail problem, I might watch the maillog with “tail -f” while making requests. Of course, if I’m looking at the logs of a very active server, I might want to only see very specific entries. Say, I’m not sure why a particular mailbox isn’t receiving mail. We can combine tail and grep, like so:

tail -f /var/log/maillog | grep info@thefacelesscorp.com

Now, when I send an email to info@thefacelesscorp.com, I’ll see the related entries in the maillog (of course, in some cases, it won’t show all related entries…you might then need to pick out a message ID and grep the whole log based on that ID).

Next week, we’ll cover using Perl to extract useful information from your system and build time series graphs from the data.

See also

grep documentation

grep at Wikipedia

tail documentation

tail at Wikipedia

  1. Regular expressions, or regexes, are a syntax for advanced pattern matching. There is a de facto standard known as egrep, or extended grep, style regexes. This further evolved into Perl style regexes, which are used by many other languages and tools, via the pcre (Perl Compatible Regular Expressions) library. The Perl regex documentation is among the best on the subject. Jeffrey Friedl’s Mastering Regular Expressions takes the subject to the next level, and covers grep, egrep, sed, Perl, and much more. []