Advanced monit: Keep track of daemons, websites, RAIDs and partitons

Introduction

Are you already hosting your own mail- or webserver and do you enjoy the flexibility, control and freedom self-hosting gives you? Besides the many advantages like better privacy and the power to customize it gives you personally, you can also offer your services to other people. Even tough there are a large number of budget hosting companies, many customers are willing to pay for better support or the comfort to have you around for questions.

But what makes the difference between a hobbyist server operation for yourself and a professional hosting business you can charge for? I would argue that the key difference is reliability. If your personal blog or email goes offline for a day because you don’t have time to fix it, it’s only you who will suffer. On the other hand, when you’re responsible for a company’s emails or their webshop they will lose real money and probably pick a different service soon.

Prerequisits

In this article I will introduce the advanced features of monit and it’s bigger brother, m/monit. While there are definitely newer and maybe more advanced monitoring solutions for different use cases, monit hits a sweet spot in terms of functionality for small- and medium sized server operations. I will assume, you have already set up monit and use it to monitor the most basic system properties and a few processes. If not, just work through one of the many introductory articles on monit and then come back to this page.

Web services: the sum of their working parts

For users, the web sites- and services they consume on a daily basis are abstract things that magically descend from a fluffy cloud, sometimes called the internet. As system administrators we know that web services are the sum of many interconnected programs running on ordinary computers. As long as all these services do their job, they go mostly unnoticed, but once any of them breaks the whole service will be rendered unusable. As admins it’s our job to keep everything running smoothly 99.9% or more of the time.

Before starting to configure your local monit agents, you should make a list of services and resources required to run your web services. It’s a good idea to start at the ‘bottom’ and work your way to the top:

  • CPU and memory are available
  • all physical hard drives in your RAID are working
  • partitions are mounted
  • enough space is available on partitions
  • critical system files are available and have the right permissions
  • all processes are running and respond to requests
  • website is available with correct content
  • scheduled jobs are successfully executed

Once you have a rough idea on what can (and will) go wrong, you can start adding appropriate rules to monit.

System checks

This class of checks monitors the whole system rather than an individual process. In many cases it will alert you of a problem, even if it didn’t occur in a resource you are explicitely monitoring. A fairly complete system check looks like this:

[/crayon]
This will give an alert, whenever CPU usage is dangerously high or the system is running out of memory and swap space. Instead of alert one could also use restart.

Filesystem checks

Once your system has enough CPU cycles and memory, it makes sense to add some data partitions to hold websites, emails or databases. It’s good practice to have a small system partition and a bigger data partition. That way your server will stay responsive, even after users fill up the data partition.

[/crayon]
The first block will check the root partition. The second block checks a RAID. If the RAID is not currently active, it will be assembled by monit. For this to work md3 should be specified in /etc/mdadm.conf.

To mount make sure the partitions are mounted properly, use the directory statement:

[/crayon]
The directory statement can also be used to monitore backups. If your backup script should ever break down, you will be notified.

[/crayon]

Check network devices

In some cases you will also need to take care of other network devices, like routers, access points or switches. Monit can do this as well. If Wifi ever goes offline, you will know before your clients:

[/crayon]

Check configuration files

In the event your server is hacked, it can make sense to monitore critical configuration files.

[/crayon]

Process checks

Checking processes is at the very heart of monit. This includes MySQL, Apache, SSHD and many others. Let’s start with Apache:

[/crayon]
This will check CPU usage, memory, children and load. If your webserver should ever become unresponsive, monit will automatically restart it. Note that I’m using killall to stop it, because in some extreme cases the init-script won’t work.

For MySQL and Memcache the checks are a bit simpler:

[/crayon]
SSHD will rarely fail, but still needs checking:

[/crayon]
These are the configurations I use most often. For other services you can generally find templates in the official Configuration examples.

Check remote hosts and services

There can be cases, when an essential service is not fully under your control. Think of a corporate website or an external SMTP-server. If your own services (or users) depend on those, you can still monitor them with a local monit instance:

[/crayon]
This check will not only alert you, if the site goes offline, but even if a database error renders the site disfunctional or hackers deface it.

Check file contents or command output

Monitoring the content of files is another great monit feature. I use it to keep track of my RAID or even remote ZFS-systems.

This first rule looks into /proc/mdstat and will alert you if one of the drives is offline. You could achieve the same with mdadm directly, but I prefer to have all my monitoring in one system.

[/crayon]
The rule below looks at the output of /tmp/FreeNAS_move.txt. This output is created by a CRON-job that simply saves the output of a remote command. If the command reports a problem with the ZFS array, you will be alerted by monit.

[/crayon]

Big brother: m/monit

No write-up on monit would be complete without mentioning m/monit. This handy, but commercial program can collect the output of multiple monit instances. If you have multiple servers to manage, it might be worth the investment.

To configure your local monit instance for m/monit add this command to monitrc:

[/crayon]

Conclusion and further resources

If you are looking to bring your server game to the next level, the keys are reliability and scalability. Monit can greatly help you with both.

I recommend the following sites, while diving deeper into the topic:

Monit documentation

More configuration examples

Disclaimer: I’m not affiliated with the provider of m/monit in any ways and don’t get a commission from them.