There are a few basic elements to maintaining and administering systems: configuration, software management, data integrity and availability, and monitoring and reporting. This article introduces a number of tools for the last of those components, as well as presents some simple ways to create custom tools to report on data specific to your environment. There are dozens of great Open Source tools for gathering and presenting data, and so this series merely scratches the surface, but it provides a good introduction to some of the major system data analysis problems and presents some solutions.
Who, What, When, Where, Why and How
The six W’s (yeah, I’m not sure why “how” is one of the Ws, either) of reporting also apply to systems data. You want to know:
Who has been interacting with your server and services.
What they did.
When they did it, so you can determine if something they did is related to problems on the system.
Where they were coming from, just in case they aren’t who they claim to be.
Why? OK, so systems data probably can’t tell you why someone did something. You’ll have to ask them. But, with the right tools you’ll know who to ask and what to ask them, if anything funny does happen on your systems.
And, how any problems came about, so you can prevent them in the future. In short, the goal of all of this analysis and reporting on systems data is to keep your sysadmin house in order.
In the spirit of starting from first principles, we’ll begin this little exercise with the rudimentary tools that every system administrator ought to know a bit about: grep and tail.
While there are lots of automatic tools that provide graphs and charts and doohickeys that you can click or drag or hover over for hours of fun, odds are very good that some day, you’ll need to find out something very specific about a service on your system. Do you really want to schlep all over the Internet looking for just the right log analysis tool to find out whether that important message your boss sent to your companies biggest client was actually delivered? Of course not! Your boss is breathing down your neck right now. This is a job for grep!
grep is a search tool. It finds lines in a text file that match a regular expression1 and prints it to STDOUT. Like all UNIX command line tools, it can easily be combined with other tools for maximum awesomeness. So, let’s see grep in action, eh?
Find the boss’ email to firstname.lastname@example.org. Your boss (email@example.com) sent it out yesterday and he still hasn’t gotten a reply!
grep "to=<firstname.lastname@example.org>" /var/log/maillog</email@example.com>
Assuming your boss actually sent the message, this will print out something along the lines of:
Sep 24 23:04:52 www postfix/smtp: 93498290E97: to=
, relay=none, delay=42281,
status=deferred (connect to mail.superhappymegacorp.com[192.168.1.100]: Connection timed out)
Aha! The superhappymegacorp.com mail server isn’t responding. The message didn’t go through yet, but it’s not our fault! Ass covered. Rest easy and reward yourself with another one of those delicious cupcakes that cute secretary brought in this morning.
Just when you begin to think the rest of your day is going to be easy, in comes the web designer. She’s thoroughly in a panic because one of her off-shore contractors got the syntax wrong in an .htaccess file and exposed a directory filled with sensitive files. It’s now been fixed, but she needs to know if anyone outside of the company accessed those files during the couple of days while they were exposed. Hmmm…sounds like another job for grep. But, we need to find entries that don’t match a particular pattern. We’ll use the “-v” option to negate the pattern.
grep -v ^192\.168\.1\. /var/log/httpd/access_log
This assumes 192.168.1. matches our local company subnet. The “^” indicates that the pattern should appear at the beginning of a line, which in the Apache common log format is where the client IP appears. Because grep uses regular expressions, and the period “.” has special meaning (it means “match any single character”), I’ve used a backslash “\” to escape the periods in the IP. It would match anyway, because a period matches “any single character”, but it could lead to false positives (or negatives in this case) because 192.168.100.1 would match even though it isn’t in the 192.168.1.0/24 network.
Next up, tail, a nifty little tool that I use many times every day. In its simplest form it simply displays the last 10 to 20 lines of a file. Because log files on a UNIX system always append new entries to the end of the file, this will always show the most recent items in the log. It’s very useful for interactively debugging problems.
Even better, modern tail implementations include the “-f”, or “–follow”, option, which prints the log entries as they are added. So, if I were debugging a particularly ornery mail problem, I might watch the maillog with “tail -f” while making requests. Of course, if I’m looking at the logs of a very active server, I might want to only see very specific entries. Say, I’m not sure why a particular mailbox isn’t receiving mail. We can combine tail and grep, like so:
tail -f /var/log/maillog | grep firstname.lastname@example.org
Now, when I send an email to email@example.com, I’ll see the related entries in the maillog (of course, in some cases, it won’t show all related entries…you might then need to pick out a message ID and grep the whole log based on that ID).
Next week, we’ll cover using Perl to extract useful information from your system and build time series graphs from the data.
- Regular expressions, or regexes, are a syntax for advanced pattern matching. There is a de facto standard known as egrep, or extended grep, style regexes. This further evolved into Perl style regexes, which are used by many other languages and tools, via the pcre (Perl Compatible Regular Expressions) library. The Perl regex documentation is among the best on the subject. Jeffrey Friedl’s Mastering Regular Expressions takes the subject to the next level, and covers grep, egrep, sed, Perl, and much more. [↩]