First steps with Nagios
In a previous post I described the installation of Nagios on a Linux machine. In this post i’ll note my steps to monitor a couple of services.
First of all, I must point out that the official Nagios site has some excellent documentation to get you started. Secondly, even though in one of my previous posts I gave an excellent link on how to install Nagios on Ubuntu, this is a very “vanilla” install and it lacked two features which I needed:
- The ability to monitor https URLs
- The ability to use SNMP to monitor units
In order to get the above two working, you do need to download and install several packages. For HTTPS, you need to install both openssl and the openssl-dev (development) libraries. For SNMP, on ubuntu you need to download both the snmpd and the snmp packages, since these include the net-snmp libraries which are needed.
Now that the dependancies have been sorted, we need to recompile the Nagios plugins in order to support https and snmp. so we navigate to the folder where the plugins were downloaded to, in my case /var/nagios/nagios-plugins-1.4.14 and we reconfigure the plugins with an extra switch pointing to the openssl files like so:
./configure –with-nagios-user=nagios –with-nagios-group=nagios –with-openssl=/usr/bin/openssl
let the configuration run its course, and at the very end of the long output you should see a summary similar to
config.status: creating po/Makefile
--with-ping-command: /bin/ping -n -U -w %d -c %d %s
Note that https (openssl) is now enabled. SNMP unfortunately doesn’t give you such a clear confirmation, I usually just search for ‘check_SNMP’ and see if there is a compiled file present. Once configuration has been done, run the normal:
to finalise the compilation.
Now that we got the SSL and SNMP working, we go on to actually testing the plugins. Again, we navigate to the nagios plugins directory. Run the command ls check_*. The command lists all the enabled plugins, you should see “check_snmp” and “check_http” listed among them. First we’ll check the HTTPS urls. Try run the commad:
./check_http -S -H http://www.verisign.com
This will enable SSH mode (-S) and check the host http://www.verisign.com. Hopefully it will return an HTTP ok similar to:
HTTP OK: HTTP/1.1 200 OK – 40849 bytes in 2.015 second response time |time=2.014922s;;;0.000000 size=40849B;;;0
which means the site responded successfully to the probe. Second we’ll check the SNMP module. The nagios documentation gives us a very good example of return the uptime value for an SNMP node by running the command:
./check_snmp -C public -o sysUpTime.0 -H 192.168.168.168
The above will check the sytem uptime of host 192.168.168.168 using SNMP read community string of “public”. It should return the uptime similar to:
SNMP OK – Timeticks: (176656773) 20 days, 10:42:47.73 |
At this point we’ve confirmed the plugins themselves work. We now need to configure the hosts on which we want to run the tests. Nagios has a series of configuration file (e.g. nagios.cfg) which contain different configuration settings of which services to monitor on which hosts. All the configuration files follow the same syntax, they are split into multiple files for modularity. The main .cfg file is the nagios.cfg file which by default is located at /usr/local/nagios/etc/. This file contains a list of other .cfg files which Nagios will check in order to compile its configuration:
# OBJECT CONFIGURATION FILE(S)
# These are the object configuration files in which you define hosts,
# host groups, contacts, contact groups, services, etc.
# You can split your object definitions across several config files
# if you wish (as shown below), or keep them all in a single config file.
# You can specify individual object config files as shown below:
Note the list of “cfg_file”. Each of the .cfg files listed contains a piece of the nagios configuration. As an example, lets again first focus on defining an HTTPS site to monitor. I first edited the command.cfg file on the objects folder which contains “macros”. By macros I simply mean giving a command a name. For example, in order to monitor an HTTPS site I added:
# check https command definition
command_line $USER1$/check_http -S -H $ARG1$ -p 443
The above basically defines a command, named “check_https” and gives the command to run when we use “check_https”. Using the same syntax as I used when checking the HTTPS plugin, the penultimate line activates SSL mode (-S), on port 443 (-p 443) and checks the host that will be defined in a variable called $ARG1$. Now Nagios knows what to do, but it doesn’t know where to apply this. So we next define the host. Before we do so, have a quick look in the objects folder, and open the templates.cfg. This file contains several entries enclosed in the syntax:
Each one of these is not a host per-se, but allows other configuration files to read these attributes. So instead of typing the same attributes for each host, we use a template an the attributes listed in the template will be pushed or inherited to the actual host. For example, I would like to apply all the attributes in the “linux-server” to my monitored host. A quick scan of the template shows this template uses in turn another template “generic-host” as evidenced with the use command. The host will be pinged as shown by the check_command check-host-alive (this check_command references some definition from command.cfg). Also note a bit further down the template for generic-serivce, which I will also be using. Note that the checks on the services are run every 10 minutes (normal_check_interval).
Now that we have those in mind, we go onto the hosts.cfg file and add a new host entry like so:
use linux-server ; Inherit default values from a template
host_name Aventail ; The name we’re giving to this host
alias Aventail_Case_01157550 ; A longer name associated with the host
address 192.168.100.2 ; IP address of the host
Here we simply gave a name to an IP and asked Nagios to apply the linux-server template to it. So now for all intents and purposes 192.168.100.2 = Aventail. Next we define which services we want to check on this above host. Within the same hosts.cfg we define a new service:
use generic-service ; Inherit default values from a template
host_name Aventail ; Apply this service to this host
check_command check_https!192.168.100.2 ; the actual command to run
We instructed Nagios to run check_https (as we defined it in command.cfg) to “Aventail”. The !192.168.100.2 passes the IP to the $ARG1$ in check_https.
Note: be aware there are more elegant ways of doing this, by using in-build variables such as $HOSTADDRESS$ which automatically applies the ip address of the host as defined in the hosts.cfg
Nagios seems to me like a large daisy chain of configuration files…. Since this can be really confusing, we can verify the Nagios configuration as described on the Nagios official documentation by running:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
This will verify the configuration and output any problems it encountered. If everything check out ok, then just restart the nagio server /etc/init.d/nagios restart and you’re off🙂 There are more details on monitoring public-facing services on:
1. Define the command (what to execute)
2. Define the host (where to execute)
3. Define the service (when to execute)
By extension, the above three steps means checking the appropriate templates.
Step 1 : Define the command :
– Open the command.cfg file and create the new service:
# ‘check_snmp’ command definition
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
Step 2 : Define the host :
– Open the switch.cfg (in this case the host is a switch):
use generic-switch ; Inherit default values from a template
host_name UK-TL-Gateway ; The name we’re giving to this switch
alias NSA Gateway ; A longer name associated with the switch
address 10.1.1.254 ; IP address of the switch
Note the host uses the generic-switch template.
Step 3 : Define the service :
– Within the switch.cfg file define the service that runs on the host:
use generic-service ; Inherit values from a template
check_command check_snmp!-C public -o sysUpTime.0
Note that the actual check_command
commands would really depend on the nagios plug-in being used. For example the check_snmp command parameters are defined in more detail here