Choosing a Linux Distribution for Your Web Server

centos_logo_smalldebian

ubuntu

Several years ago, I wrote a post about choosing a Linux distribution for a web server. It’s be so long that I don’t even remember where I posted it (so I, unfortunately can’t link to it), so it’s probably time to revisit the subject, as it does come up pretty frequently in our forums and in conversations with customers. The choice is somewhat more obvious today than it was back then, and I recall I covered at least five distributions (and I believe Solaris and FreeBSD) in that previous article. In this article, the leaders in the server operating system market are pretty clear, at least for Open Source platform web deployment, such as node.js, Ruby, Python, PHP, Perl, or Go. Because there are clear market leaders, I’m going to focus my attention on just three Linux distributions: CentOS, Debian, and Ubuntu.

I will briefly explain why there are only three distributions most people should be considering for server deployment, and I’ll also briefly mention some situations where you might want to branch out and consider other options.

So, let’s get on with it, and pick out the right Linux distribution for your new web deployment!

Lifecycle Is Really, Really, Incredibly Important

The average server remains in service for over 36 months. I have a couple of machines that have been in use for over six years without an OS upgrade! Upgrading the Operating System on a production server, even when a remote or in-place upgrade option is available, is prone to breaking existing services in unpredictable ways, or at least in ways that are difficult to predict without a very long and time-consuming audit of all of the software running on the system and how all of the pieces interact and how they will change when upgrading to newer versions.

Thus, one goal when selecting an OS for your server should be to insure you have plenty of time between mandatory upgrades. Of course, nothing stops you from upgrading earlier than you need to. If you want newer packages and have the time to perform the upgrade or to migrate to a new server before the OS reaches its end-of-life date, there is nothing stopping you. What we are more concerned about is how soon that decision will be forced on us.

With regard to lifecycle of the major Linux server distributions, CentOS (and RHEL) is, by far, the king, with a 10 year support period. Ubuntu LTS is second with a 5 year cycle. Debian is somewhat unpredictable, but always has at least a 3 year lifecycle; sometimes there may be an LTS repository that will continue support for a given version.

Non-LTS Ubuntu releases should not be considered for server usage under any circumstances, as the lifecycle of ~18 months is simply too short. Likewise, Fedora Linux should not be considered for any server deployment.

The end-of-life for current CentOS releases is as follows:

CentOS 5 Mar 31, 2017
CentOS 6 November 30, 2020
CentOS 7 June 30, 2024

For Ubuntu LTS:

Ubuntu 10.04 LTS April, 2015
Ubuntu 12.04 LTS April, 2017
Ubuntu 14.04 LTS April, 2019

For Debian:

Debian 6 (with LTS support) February, 2016
Debian 7 ~Late 2016 estimated
Lifecycle Winner
CentOS by a five year landslide. If you don’t know when you’ll have the time and inclination to upgrade your server OS or move to a new server, CentOS may be the best choice for your deployment, if the other deciding factors don’t sway you to something else. Not having to think about server upgrades until 2024 is pretty cool.

 Package Management

The reason a long lifecycle for your server operating system is is so important is that you need to be able to count on your OS to provide security updates for the useful life of your server. And, the method by which software updates, particularly security updates, are provided is vitally important. It needs to be easy, reliable, and preferably something you can automate without risk.

All of the distributions in this comparison have excellent package management tools and infrastructure. In fact, they are all so excellent that I was tempted to ignore this factor altogether. But, there are some subtle differences, particularly in the available package selection. And, if you’re considering going outside of the Big Three Linux distributions covered here, or are considering a BSD or Windows for your deployment, you should definitely consider how updates will be handled, as the picture is not nearly as pleasant on every distribution and OS, and many cannot be reliably automated.

apt

The package manager invented for Debian and also found on Ubuntu is called apt. It is a very capable, fast, and efficient, package manager that handles dependency resolution and downloading and installing packages from both the OS-standard repositories and third-party repositories. It is easy to use, has numerous GUIs for searching and installing packages, and can be automated relatively reliably. apt installs and manages .deb packages. It is reasonably well-documented, though it has some surprising edge cases.

yum

Yum, aka Yellow Dog Updater Modified, was initially developed for the Yellow Dog Linux distribution as the Yellow Dog Updater (a special build of Red Hat/Fedora for Macintosh hardware), and then forked and enhanced by Seth Vidal. yum installs and manages RPM packages, and is found on CentOS, Fedora, RHEL, and several other RPM-based distributions. There are both command line and GUI utilities for working with yum, and it is well-documented.

Which is better?

Choosing between package managers is difficult, as both mostly have the same basic capabilities, and both are reasonably reliable. They both have been in use for many years, and have received significant development attention, so they are quite stable. I believe you could easily find fans of both package managers, and I wouldn’t really want to argue too strongly either way.

I’ve worked extensively with both, and the only time I had a preference was when I was creating my own repositories of packages and when I needed to customize the package manager, and in both cases yum was much more hacker-friendly. Creating yum repositories is as simple as putting some files on a webserver, and running the createrepo command. Creating apt repositories is much more time-consuming, and requires learning a number of disparate tools, and creating scripts to automate management of the repositories.

Package Management Winner
yum on CentOS, by a small margin, if you plan to host your own package repositories. If you have no need for your own repos, or are already familiar with apt, either as a user or developer, it is a tie.

Package Selection

Closely related to package management is package selection. In other words, how many packages are readily available for your OS from the system standard repositories, and how new are those packages? Here, there are some interesting differences in philosophy between the various systems, and those differences may help you choose.

CentOS

CentOS package selection is the smallest, by far, of these three distributions, in the standard OS repositories. In the Virtualmin repositories, we have to fill in the gaps by providing a number of what we consider packages that are core to hosting service. It is missing things like ClamAV virus scanner for email processing, ProFTPd FTP server (among the most popular and more feature-filled FTP servers available), and others. This is an annoyance which the other two distros do not make you endure. CentOS has about 6,000 packages in the standard repository.

On the other hand, CentOS has the Fedora EPEL repositories, which provide Fedora packages rebuilt for CentOS. This expands the selection of available packages on CentOS with a couple thousand extra packages. One thing to keep in mind is that EPEL is not subject to the lifecycle promises of the official CentOS repositories, and is subject to volunteer contributions to keep the packages up to date (much like Debian). Most of the popular packages are pretty well-maintained, but I have occasionally seen security updates fall behind in the EPEL repos for some packages for older versions of CentOS, which can be worrying. I generally advise selectively enabling EPEL repositories, by using the includepkgs or exclude options within the repo configuration file. In this way, you’ll know exactly which packages have come from EPEL and which ones need extra caution as time passes to insure they are kept up to date and secure.

CentOS packages in the latest release also tend to be older than those found in the latest Ubuntu release. Often this merely depends on who has had a more recent major version release, and for the moment CentOS 7 has some newer packages than the latest Ubuntu 14.04 LTS release. But, the latter also has newer versions of some important packages despite being released earlier.

CentOS is particularly strong (or weak, depending on how you look at it) about keeping the same version of packages throughout the entire lifecycle of the OS release. Thus, CentOS 7 will have Apache version 2.4.6 throughout the entire ten year life of the OS. Security updates will be applied as patches to that version of Apache, rather than adding new versions to the repository. This insures compatibility throughout the entire lifecycle, and makes it much more predictable that your server will continue to function through security updates. However, it also insures that in five years you’ll be wishing for newer versions of PHP, Ruby, Perl, Python, MySQL or MariaDB, and Apache. It is a double-edged sword and for some people the cost is too high.

In addition to the EPEL package repositories, there is also the Software Collections (SCL) repository. This repository includes updated versions of popular software, mostly programming languages and databases. There is currently SCL support for CentOS 6, but it is likely to be available for CentOS 7 as the packages found in CentOS 7 become more dated. This can allow you to continue to use an older OS version while still utilizing modern language and database versions. You can read more about the Software Collections in the CentOS Wiki.

Ubuntu

Ubuntu, with all repositories enabled (including universe), has about 23,000 packages. As you can see, there are a lot more packages available for Ubuntu than CentOS. But, many of the less popular packages are considerably less well-maintained. Sticking to the core repositories (main and security) may be advisable, in the same way that avoiding general use of EPEL on CentOS is advised. It’s best to know your packages are being well-cared for and that lots of other people are using those packages, so bugs are found quickly.

In our Virtualmin repositories for Ubuntu, we don’t have to maintain any binary packages aside from our own programs, which is indicative of how well-equipped the standard Ubuntu repositories are for web hosting deployments. It is possible to install nearly anything you could want or need, and in a relatively recent version, on the latest Ubuntu release. Ubuntu is also less strict about keeping the same version, and more likely to provide multiple versions, of common packages, like Apache, PHP, and MySQL or MariaDB. This makes Ubuntu a favorite among developers who like to stay on the bleeding edge of web development tools like PHP, Ruby On Rails, Perl Dancer, Python Django, etc.

In short, Ubuntu has far more packages and generally more recent packages, than CentOS. Ubuntu usually has more recent packages than Debian stable releases, as well, and a better update policy in terms of stability. Ubuntu’s update policy is not a strict or predictable as that of CentOS, but it is unlikely you will run into compatibility problems between minor version changes that can happen on Ubuntu with some of the core hosting software.

Debian

Debian has the most packages in its standard repositories, with something along the lines of 23,000 packages. The popular packages tend to be well-maintained by a veritable army of volunteers and using excellent infrastructure to assure quality. However, many of the packages will be quite old, at any given time. And there is less assurance of compatibility between updates in Debian than in CentOS, or even the core Ubuntu repositories.

Given Debian’s short lifecycle vs CentOS, and Ubuntu’s ability to tap into the universe repository for access to roughly the same number and quality of packages as Debian, it is hard to argue that Debian leads in this category, even though historically its huge selection of packages was hard to beat. Debian’s stable release also tends to have somewhat older packages, even in the beginning of its lifecycle, which can be a negative for some deployments.

Package Selection Winner
Ubuntu, if sheer number and newness of packages is most important. Or, possibly CentOS, by a small margin, if you prefer stability over newness, and prefer to insure your software never stops working due to incompatible changes in software running on the system.

Upgrading

I recommend not upgrading servers to entirely new versions of the OS frequently, generally speaking, since it can be time-consuming and it can introduce subtle malfunctions that can be hard to identify and fix. If you do need to upgrade, a valuable feature is the ability to upgrade without physical access to the system. This can be somewhat nerve-wracking, for servers you don’t have easy hands-on access to, but some distributions are better at it than others.

Debian and Ubuntu

apt has long been an accepted method of performing an OS upgrade on Debian, since long before Ubuntu even existed. The apt-get dist-upgrade command will handle not just dependency resolution, but it will also handle packages that have been made obsolete by newer packages or situations where various libraries have moved to new packages. This allows a system to be upgraded to a new version with very little disruption, and because it has been in use for many years, it is generally pretty reliable and a well-supported method of upgrading the system.

The process of upgrading Debian or Ubuntu using apt is quite similar, though in my experience Debian upgrades are historically smoother than Ubuntu upgrades, for a variety of reasons, but mostly because of the more conservative nature of Debian development, and the fact that more Debian users are in various states of running newer and older software together (mixing and matching of repositories on Debian is more commonly done to get newer packages, and for development purposes), so community testing of various package versions within each system version is broader, if not deeper. This is a historic difference, based on my own experiences with Debian and Ubuntu upgrade, and may be alleviated by the much larger number of Ubuntu users today.

The important thing here, however, is that upgrades on Debian or Ubuntu are a relatively painless affair, at least when compared to CentOS.

CentOS

Upgrading a CentOS system is more cumbersome. While it is possible to perform an OS upgrade with yum, it is not currently recommended or supported by the CentOS developers, so remote upgrades are very challenging. In fact, there isn’t even a very clear path for upgrading from CentOS 6 to CentOS 7 while sitting at the console. There are new tools in development for handling OS upgrades using yum, fedup on Fedora and redhat-upgrade-tool on RHEL/CentOS, which will likely eventually provide a reasonable upgrade process. Though, I have never seen an upgrade using this process work without significant manual correction of issues after the upgrade process completes. I would not trust this method to upgrade a remote system, unless I had KVM access, and remote hands available in the data center to handle inserting a rescue CD should it come to that.

In short, CentOS should be considered a “cannot upgrade” OS for servers in remote locations. The only tools for performing remote upgrades are very early alpha quality at best and are not recommended by their developers for production systems.

Upgrade Winner
Debian, because of its long history of users upgrading via apt and its ideology of mixing and matching packages from various repositories, relying on the dependency metadata of the packages to allow them to reliably interoperate. Ubuntu provides a reasonable upgrade path using the same mechanism, and is a very close second. CentOS isn’t even in the game, and cannot be upgraded remotely via any reasonable mechanism.

Popularity

Ordinarily, I don’t recommend looking to popularity as a major deciding factor in choosing software, though for a variety of reasons, it does make sense to choose tools that are used by a reasonably large community. This is especially true for Open Source software.

Popular software will have more people using it, more people asking and answering questions about it online, and more people who are experts or at least comfortable working with it. This insures you can get help when you need it, you’ll be able to find plentiful documentation, and you’ll be able to hire people with expertise if you get stuck in a situation that’s over your head.

On this front, things have shifted quite a bit in the past several years. CentOS once ruled the web server market, with a huge market share advantage. Among our many thousand Virtualmin installations, CentOS accounted for approximately 85%. Today, CentOS is still the most popular web server OS, with about 50% market share (depending on who you ask and which specific niche you’re talking about, this may vary quite a bit), with Ubuntu following closely behind with 30% (and in some niches it may even hold a larger share than CentOS), and Debian following behind with about 15%.

For the majority of users, any of these three systems has achieved the minimum level of popularity necessary to insure you have a large and vibrant community of developers, users, authors, and freelancers, available to make the system work well in a wide variety of use-cases. I would not hesitate to recommend any of these systems, but would caution going outside of these three systems, because the user base of everything else is so very small.

Popularity Winner
CentOS, but it probably doesn’t matter all that much. With a 50% market share, you’re most likely to find the help you need when problems or question arise. But, Ubuntu and Debian also have very large and active communities, and you’re likely to find all the help and documentation you need for any of them.

Your Experience Level

This one won’t have a winner that I can choose for you, and simply has to be decided based on your own experience level. And, it may even be the most important single factor. If you are an expert on one distribution, but a novice on the others, you would almost certainly want to choose the one you know over the ones you don’t (unless others on your team have different expertise).

If you use Ubuntu on your desktop or laptop machine, you may find that using an Ubuntu LTS release on the server provides the least friction; you can develop in roughly the same environment you’ll be deploying into. Likewise, if you are a Fedora user on the desktop, CentOS is an obvious choice, because they share the same philosophy, package manager, and many of the same packages (Fedora can be seen as the rapidly moving development version of CentOS, and most packages and policies that find their way into CentOS began by being introduced into Fedora a year or more before).

Of course, if you have no strong existing preference, it would be wise to consider your needs for your systems and compare the other factors in this article.

Experience Winner
You! You get to choose from some of the most amazing feats of software engineering ever to exist, representing millions of person-hours of development, and they’re all free and Open Source. We live in amazing times, don’t we?

Some Final Thoughts

If you’ve made it this far, congratulations! You now know I like all three of the most popular web server Linux distributions quite a bit, and think you will probably be pretty happy with any of them. You also know that CentOS is possibly the “safest” choice for new users, by virtue of being so popular on servers, but that Ubuntu is also a fine choice, especially if you use Ubuntu on the desktop.

But, let’s talk about the other distributions out there for a moment. There are some excellent, but less popular distributions, some of them even have a reasonable life cycle and a good package manager with good package selection and upgrade process. I won’t start naming them here, as the list could grow quite long. I do think that if you have a Linux distribution that you are extremely fond of, and more importantly, extremely familiar with, and the rest of your team shares that enthusiasm and experience, you may be best off choosing what you know, as long as you do the research and make sure the lifecycle is reasonable (three years is a little short, but most folks would be OK with a 5 year lifecycle, especially if upgrading is reasonably painless).

There are also a variety of special purpose distributions out there that may play a role in your deployment, if your server’s purpose matches that of the distribution. Some good examples of this include CoreOS or Boot2docker, which are very small distributions designed just for launching Docker containers, and those containers would include a more standard Linux distribution. Those are outside of the scope of this particular article, but I’ll talk more about them in a future post.

And, if you’ll be installing the Virtualmin control panel on the system (and I think you should, because it’s the most powerful Open Source control panel and also has a well-supported commercial version), you’ll want to make sure it’s one of our Grade A Supported operating systems.

Virtualmin Memory Usage (and Other Tales of Wonder and Woe!)

I’ve noticed over the years that one of the most common sources of confusion for new Virtualmin users, or users who are new to Linux and web hosting in general, is memory usage. I’ve written up documentation about Virtualmin on Low Memory Systems in the past, but it focuses mostly on helping folks with low memory systems reduce memory usage of their Virtualmin (and all of its related packages, like Apache, PHP, MySQL, and Postfix) installation. It goes into interesting detail about Webmin memory usage, library caching in Virtualmin, etc. but doesn’t go into things like the memory usage of various services in a Virtualmin (or any LAMP stack) system. This article will briefly address each of these subjects and provide real world numbers for how much memory one should expect a Virtualmin installation to require.

A side story in all of this is how Virtualmin compares to other web hosting control panels. Somehow, this is considered interesting data for some folks, though I can’t really fathom why, given the huge differences in functionality available, particularly when comparing control panels with extremely limited web-focused functionality with full-featured control panels (like Virtualmin, cPanel, or Plesk) that provide mail processing with anti-virus and spam filtering, database management, etc.  But, it comes up a lot. So, let’s get some hard numbers for Virtualmin and talk about where those numbers come from. If anyone happens to have data about memory usage of other control panels, feel free to post them in the comments (though, I doubt any control panel will use vastly more or less memory than Virtualmin, unless it’s written in Java, or something similar).

Where does the memory go‽

The first thing I want to do is break down memory usage in a production Virtualmin system, and talk about which components require large amounts of memory, and which ones can be reduced through options or disabling features.

Virtualmin system top

top sorted by memory usage on a very busy 8GB server

The above image is the output of the top command on a Virtualmin system that has several active websites, including a large Drupal deployment (the one for Virtualmin.com which has ~30,000 registered users, ~100,000 posts and comments, and receives about 100,000 visitors a month, at time of writing) and all of our software download repositories. As you can see the system has 8GB of RAM and 2GB of swap memory. Here’s what we see is using the majority of memory on this system, in order of size:

  • mysqld – This is the MySQL server process. It is configured with quite large buffers and cache settings, in order to maximize performance for our Drupal instance and other applications that access the database, such as the Virtualmin Technical Support module (which can create tickets in our issue tracker). This is the largest single process on the system, which is likely to be true on most systems with large database-backed websites. It has 2.3GB of memory allocated, though all but 418MB is not necessarily dedicated to this process or in physical RAM. See the note below about virtual vs. resident size.
  • clamd – This one always surprises people, and folks often forget about it when calculating their expected memory usage. ClamAV is very demanding, because it loads a large database of virus signatures into memory. Virtualmin allows it to be configured as either a daemon or a standalone executable…but the standalone version is extremely slow to start, and causes a spike of both CPU and disk activity when starting. So, if you plan to process mail (on any system, regardless of whether Virtualmin is involved), you should expect to give up a few hundred megabytes to the spam and AV filtering stack. The ClamAV server has 305MB resident size.
  • php-cgi – There are several of these, and they represent the pool of mod_fcgid spawned PHP processes that are serving the Drupal website. They are owned by user “virtualmin”, because we use suexec on this system, and the site in question is virtualmin.com, and the username for that account is virtualmin. The PHP process is quite large here, larger than most, for a few reasons. Primarily, it is because we make use of a large number of Drupal modules, and some of those modules are quite demanding, so we’ve had to increase PHP memory limits for this user. These processes have ~135MB resident size, and much larger virtual size, but all of the virtual memory usage is shared across every php-cgi process for every user.
  • lookup-domain-daemon.pl – This is part of the mail processing stack, and is a server provided by Virtualmin. It allows SpamAssassin and ClamAV to have user-level and domain-level configuration settings, and allows some types of configuration for these services to be modified by end users safely. This process is 55MB with another ~40MB shared with other processes.
  • spamd – The SpamAssassin server. See, I told you mail processing was heavy! At ~50MB for each of the SpamAssassin child processes, this adds up on a heavily loaded system.
  • perl – Finally, this is actually the Webmin/Virtualmin process! My system currently has library caching fully enabled, and the total virtual process size is ~135MB (this would be smaller on a 32 bit system), and a resident size of 46M. If I were on a low-memory system, I would disable pre-caching, and Virtualmin would shrink to about 15MB (less on a 32 bit system). This can be set in Virtualmin->System Settings->Preload Virtualmin libraries at startup? The options are “All Libraries”, “Only Core”, and “No”, which will cause the Webmin process to be 40-45MB resident, 20-25MB resident, or 12-17MB resident,depending on whether the system is 32 or 64 bit.
  • named – This is the BIND DNS server. It’s memory usage is quite modest compared to a lot of the other services on this system, and is probably never something one would worry about tuning, unless you serve a very high volume of DNS requests.One thing to bear in mind, however, is that if you have enabled the caching nameserver features of BIND, and many users are using it for DNS service, the process size could grow quite large. We recommend only enabling recursive lookups for the Virtualmin server itself (or, possibly even better, forwarding those recursive lookups to another server).
  • httpd – This is the pool of Apache web server processes. Notice the virtual size is quite large, while the resident size is quite small. Much of the memory usage of these processes is shared across all of them (of which there are probably 100+ on my system at any given time, due to number of concurrent users). The size of these processes is determined mostly by the number of modules you have installed. But, even on this system, with a number of modules enabled and actively used, the resident size is only 9MB per process. Given my 3.4 GB of currently free memory, Apache could spawn over 300 additional processes (beyond the 100 or more already running) without bumping into the memory limitations of this system. Apache often gets accused of being a memory hog compared to other web servers, but that’s often an unfair comparison between an Apache with a bunch of large modules (like mod_php, or mod_perl, neither of which are needed for most Virtualmin systems) and a stripped down lightweight server, like nginx that simply doesn’t have any large modules that can be enabled.

Note: VIRT and RES are indicators of the type of memory that has been allocated; VIRT includes the resident memory, as well as memory-mapped files on disk, shared libraries which share RAM with other processes, etc., while RES is the resident memory usage, which roughly reflects how mush RAM is dedicated to this process.

There are many other processes on this system, including the rest of the httpd processes, but these few processes already explain where the vast majority of memory on the system is going, and so we won’t dig any deeper into it for this story.

Just for fun, let’s see a somewhat smaller system’s memory usage:

small-virtualmin-top

Memory-sorted top output on a moderately loaded 4GB Virtualmin server

This is a ~4GB virtual machine, and I’ve temporarily disabled library pre-caching in Virtualmin, which makes the process size about 17MB (it is a 64 bit system). Since it’s so small, it doesn’t even show up in the list, when sorted by memory usage. In this case, the large processes start with MySQL, once again configured with somewhat large buffers and caches. Java shows up here, which is uncommon for me, since Java is such a best to work with, but I have a Jenkins CI instance running on this box. And, then the mail filtering stack is next, and slightly smaller than on the above system. I don’t have ClamAV running on this box, since the only email it processes is received by people running Linux and we don’t worry so much about viruses in email. And, then comes php-cgi, which is much smaller on this system, since it only runs moderately small WordPress instances, and a pretty hard-working MediaWiki installation for doxfer.webmin.com.

It’s also possible to run Virtuamin in a very small amount of memory, particularly if you don’t need to process mail on the system. We recommend at least 256MB for a 32 bit system, and 384MB for a 64 bit system, even if you won’t be running a mail stack. While Virtualmin itself doesn’t need more memory, the performance of most web applications would be pretty abysmal on anything less. MySQL performance is directly correlated with the amount of memory you can devote to it. Using nginx (which is also supported by Virtualmin) may help in reducing the needed memory usage, though a minimal Apache configuration won’t be much larger.

tl;dr

Virtualmin uses somewhere between 12MB and 46MB resident memory, and up to ~150MB virtual, depending on whether library caching is enabled and whether it is a 32 or 64 bit system.

If you’re processing mail with spam and antivirus, Virtualmin will, by default, also run a 45-55MB daemon for assisting with processing mail.

All of this is dwarfed by the actual services being managed by Virtualmin, like Apache, MySQL, ClamAV, SpamAssassin, Postfix,etc.

If you need to run Virtualmin on a very small-memory system, the best thing you can do is off-load email to some other system or service, since the full mail processing stack with SpamAssassin, ClamAV, Postfix, and Dovecot can easily add up to a few hundred MB.

Interesting Links

My favorite site to refer people to when they’re wondering about what memory usage information means on a Linux system is Linux Ate My RAM!