One config file to rule them all

Configuration files are a boring necessity in software development. Parsing existing configuration files is a necessary aspect of almost any systems automation task. I regularly need to read and write configuration files from different languages, as I have simple maintenance, startup, and installation scripts written in BASH, larger Webmin-related tools in Perl, and stuff related to our website written in PHP. Of course, there are some great configuration file parsers for Perl in CPAN, but if you need a highly portable script and you don’t want your user to have to know anything about CPAN, it makes sense to build your own.

Luckily, in all three of these languages, plus Ruby and Python (other favorites of mine), simple configuration files can be easy, if you choose the right format.

Start from the Least Common Denominator

The least capable language in this story, at least with regard to data structures, is probably BASH, so we’ll start by creating a configuration file that’s easy to use with BASH. The obvious choice is a file filled with simple variable assignments, like so:

apache.config

# A comment
show_order=0
start_cmd=/etc/rc.d/init.d/httpd start
mime_types=/etc/mime.types
apachectl_path=/usr/sbin/apachectl
stop_cmd=/etc/rc.d/init.d/httpd stop
emptyvalue=
# A blank line too..
 
max_servers=100
test_config=1
apply_cmd=/etc/rc.d/init.d/httpd restart
httpd_path=/usr/sbin/httpd
httpd_dir=/etc/httpd
#  A comment with an=sign

This file is valid BASH syntax–you could run this directly with /bin/sh apache.config and it would return no errors (though it wouldn’t do anything, because the values are not exported, so they are only in scope for the split second it takes BASH to parse the file. Because it’s BASH syntax, empty lines are ignored, and any line that starts with a # is a comment and also ignored. Empty values are also legal, so we need to accommodate lines that have only a key and no value. Also because this is a valid BASH script, we can make use of these variables in our scripts easily by sourcing this file. In shell scripts this is done using the dot operator ( . ), like so:

. apache.config

After this, each of the values in the apache.config file are accessible by their names. There are some caveats that make this a less than ideal practice for anything more complicated than a small script. The variables pollute the namespace when pulled in this way. So, if you later wanted to use $apachectl_path as a variable for some other purpose, for example, you would overwrite the existing assignment, and cause possibly difficult to diagnose errors. BASH doesn’t have support for complex data structures, so there isn’t much we can do about this, without introducing quite a lot of complexity, so we’ll take our chances and keep our scripts short and simple.

Getting the values into a Perl data structure

While our configuration file is not valid Perl syntax, Perl still has plenty of tools for working with this kind of file. After all, Perl was born to pick up the ball where shell scripts fumbled (and eventually evolved into a hodge podge of every great, and some not so great, ideas in programming languages from the past couple of decades), so it’s natural that it would have the ability to do the same sorts of things as a shell script.

But, since our configuration file is not valid Perl syntax, we can’t simply call do apache.config; as we would to import another Perl script. We’ll have to parse it into a data structure (which is better programming practice, anyway, as mentioned above). One way to do this would be a while loop, like so:

my $file = "apache.config";
my %config;
open(CONFIG, "< $file") or die "can't open $file: $!";
while () {
    chomp;
    s/#.*//; # Remove comments
    s/^\s+//; # Remove opening whitespace
    s/\s+$//;  # Remove closing whitespace
    next unless length;
    my ($key, $value) = split(/\s*=\s*/, $_, 2);
    $config{$key} = $value;
}
 
# Print it out
use Data::Dumper;
print Dumper(\%config);

Now, we can access the values in our configuration file from the %config hash, such as $config{‘apachectl_path’}. Another option, if you’re feeling particularly idiomatic, is to use map:

my $file = "apache.config";
open(CONFIG, "\< $file") or die "can't open $file: $!";
my %config = map {
      s/#.*//; # Remove comments
      s/^\s+//; # Remove opening whitespace
      s/\s+$//;  # Remove closing whitespace
      m/(.*?)=(.*)/; }
      ;
 
# Print it out
use Data::Dumper;
print Dumper(\%config);

So, what’s the benefit to this latter example? Nothing major, it’s just another way to approach the problem. It’s a couple of lines shorter, but more importantly it has fewer temporary variables, which can be a source of errors in large programs. The multiple substitution regular expressions I’ve shown above in either example could be reduced to a single line, but I believe this is more readable, and according to the Perl documentation breaking the tests out into single tests is faster than having multiple possible tests in a single substitution. Some folks also find long regular expressions difficult to scan.

But, I only like Ruby!

OK, so you want to do it in Ruby. Ruby has a lot in common with Perl, so it’s actually pretty similar, though a bit more verbose. Ruby fans seem to discourage regular expressions, though it is a core part of the language and it has roughly the same regex capabilities as Perl, so I’ve only used one (I guess I could have gotten rid of it somehow…but I got tired of searching for the non-regex answer and punted):

config = {}
 
File.foreach("apache.config") do |line|
  line.strip!
  # Skip comments and whitespace
  if (line[0] != ?# and line =~ /\S/ )
    i = line.index('=')
    if (i)
      config[line[0..i - 1].strip] = line[i + 1..-1].strip
    else
      config[line] = ''
    end
  end
end
 
# Print it out
config.each do |key, value|
  print key + " = " + value
  print "\n"
end

Same end result as the Perl versions above: A config hash containing all of the elements in our configuration file.

What about those web applications written in PHP?

Two of the websites I maintain (Virtualmin.com, and this site) are written in PHP. One is a Joomla application with numerous extensions and custom modules and components, the other is a mildly customized WordPress site. In the case of Virtualmin.com, we’re developing a number of applications that have both Perl components for the back end work and PHP components for the web front end, so sharing configuration files can be useful. Webmin, conveniently enough, already uses shell variable key=value style configuration files, so everything we do is already in this format.

So, let’s see about getting these configuration files into a PHP data structure. PHP isn’t quite as rich as Perl in its data manipulation capabilities, but it did inherit quite a few of the same tools from Perl, so our solution in PHP looks pretty similar to the while loop version above, though it is a bit more verbose due to the keyword heavy nature of PHP (Perl is often accused of having too much syntax, and PHP has way too many keywords):

$file="apache.config";
$lines = file($file);
$config = array();
 
foreach ($lines as $line_num=>$line) {
  # Comment?
  if ( ! preg_match("/#.*/", $line) ) {
    # Contains non-whitespace?
    if ( preg_match("/\S/", $line) ) {
      list( $key, $value ) = explode( "=", trim( $line ), 2);
      $config[$key] = $value;
    }
  }
}
 
// Print it out
print_r($config);

Hey, what about snake handlers?

Of course, it can also be done in Python. As with the Ruby implementation, I’m not certain this is the best way to do it, but it works on my test file.

import sys
config = {}
 
file_name = "apache.config"
config_file = open(file_name, 'r')
for line in config_file:
    # Get rid of \n
    line = line.rstrip()
    # Empty?
    if not line:
        continue
    # Comment?
    if line.startswith("#"):
        continue
    (name, value) = line.split("=")
    name = name.strip()
    config[name] = value
 
print config

Or, as dysmas suggested on Reddit, a more idiomatic version would be:

config = {}
 
file_name = "apache.config"
config_file= open(file_name)
 
for line in config_file:
    line = line.strip()
    if line and line[0] is not "#" and line[-1] is not "=":
        var,val = line.rsplit("=",1)
        config[var.strip()] = val.strip()
 
print config

So, now we’ve got a config associative array filled with all of our values in all of our favorite languages (except BASH, which gets straight variables). Assuming we use a common file locking mechanism, or always open them read only, we could even begin to use the same configuration files across our BASH, Perl, Ruby, Python, and PHP scripts independently but simultaneously.

What’s the point?

This isn’t just an academic exercise. The simple examples above make up the early start of a cross-language set of tools for systems management.

With these simple parsers, we can build tools that use the best language for the job, while still leveraging some interesting knowledge contained in Webmin’s configuration files (which are in this key=value format). Webmin supports dozens of Operating Systems and hundreds of services and configuration files, so the config files in a Webmin installation (usually found in /etc/webmin) contain a huge array of compatibility information that would take ages to gather. If you need to know how to stop or start Apache on Debian 4.0, or on Solaris, or on Red Hat Enterprise Linux, you’d have to check an installation of those systems or search the web or ask someone who has one of those systems handy. Or, you could check the Webmin configuration file, and get the same data for all of the Operating Systems Webmin supports. It’s a pretty valuable pile of data. Imagine writing a script for your own favorite OS, and then being able to hand it to anyone that happens to have Webmin installed, regardless of their OS and version. Or, if they don’t have Webmin installed, you could provide a template configuration file that they could fix for their OS and version, addressing both situations as simply as possible.

Not the only configuration file format

Of course, this isn’t the only configuration file format out there, or even the best. Python users really like INI files, and I can’t argue with them. When I was writing Perl and Python predominantly, I used the Config::INI::Simple module from CPAN and ConfigParser for Python so I could share configuration between my various software easily (I was generally writing a Webmin front end in Perl to a Python back end application). That worked great. So, I’m not arguing you ought to be using key=value configuration files for everything. But being able to read them makes a lot of portability data available to you for free.

Next time I’ll wrap a couple of these routines up into friendly libraries for easy use, and add some tests to be sure we’re doing what we think we’re doing.

5 Responses

  1. paul October 10, 2007 / 9:25 pm

    man, i read the entire thing thinking about webmin, probably because it was in the intro. didn’t even put 2 and 2 together.

  2. Christian November 25, 2007 / 2:48 am

    Hi,

    the perl should read:

    ###
    my $file = “apache.config”;
    open(CONFIG, “< $file”) or die “can’t open $file: $!”;
    my %config = map {
    s/#.*//; # Remove comments
    s/^\s+//; # Remove opening whitespace
    s/\s+$//; # Remove closing whitespace
    m/(.*?)\s*=\s*(.*)/; }
    ;

    # Print it out
    use Data::Dumper;
    print Dumper(\%config);
    ###

    ###
    my $file = “apache.config”;
    open(CONFIG, “< $file”) or die “can’t open $file: $!”;
    while () {
    chomp;
    s/#.*//; # Remove comments
    s/^\s+//; # Remove opening whitespace
    s/\s+$//; # Remove closing whitespace
    next unless length;
    my ($key, $value) = split(/\s*=\s*/, $_, 2);
    $config{$key} = $value;
    }

    # Print it out
    use Data::Dumper;
    print Dumper(\%config);
    ###

    Christian

  3. Joe Cooper December 12, 2007 / 4:12 pm

    You’re right, Christian. The editor in WordPress is hateful and “corrects” certain characters, even when they’re wrapped in pre tags.

  4. mrxinu September 29, 2008 / 12:41 am

    This was a great read. Thanks for writing it!

Comments are closed.