We currently have quite a large nagios + centreon setup with approximately 1500 checks, distributed over two pollers. This introduced quite a large load on both the servers, with top reporting a load regularly running into 10-13. This in turn causes quite a high service check latency (approx 5 seconds)
The first improvements would be to use the external command file and related solutions (such as check_ncsa and other passive check solutions) so that the end server actually processes the check and then just sends the result to the centreon / nagios server which just updates the result. This way the processing is offloaded to the other servers.
While looking for further solutions to the performance issue, I ran across mod_gearman. A very good description of what exactly mod_gearman does is provided on their website here:
In a nutshell, the benefits expected of implementing the above is reducing the service latency times and reducing the load on the server.
I set out to implement this module into our current centreon setup, and this article describes the steps taken to implement the solution on Nagios + Centreon on Centos 5.6. the commands used are heavily based on the official quickstart guide, with some corrections to the init.d scripts and including some additional steps for clarity, as well as changing the installation path to avoid some failures later on. The official quickstart guide is located here:
1. First, install the EPEL repository on CentOS
2. Secondly, install the pre-requisites for building gearman and mod_gearman from source:
#> yum install libevent-devel libgearman autoconf automake libtool boost141-devel boost141-program-options libtool-ltdl-devel.x86_64 ncurses-devel.x86_64
Please note that the above assumes a 64-bit system (.x86_64). If your system uses a 32-bit architecture, please remember to adjust the above.
3. We now install the first of two packages, “gearman”. In this case we build from source, and leave the default prefix install path of /usr/local:
#> cd /tmp
#> wget "http://launchpad.net/gearmand/trunk/0.14/+download/gearmand-0.14.tar.gz"
#> tar zxf gearmand-0.14.tar.gz
#> cd gearmand-0.14
#> make install
4. Next we move to installing the second package “mod_gearman” which hooks in directly to the nagios core:
#> cd /tmp
#> wget "http://labs.consol.de/wp-content/uploads/2010/09/mod_gearman-1.2.0.tar.gz"
#> tar zxf mod_gearman-1.2.0.tar.gz
#> cd mod_gearman-1.2.0
#> ./configure –with-user=nagios –with-init-dir=/etc/init.d
#> make install
#> make install-config
#> cp ./extras/gearmand-init /etc/init.d/gearmand
5. As the quickstart guide states, make sure the nagios user has a valid login shell:
#> chsh nagios
6. Modify the default init.d script so that it starts successfully:
#> vi /etc/init.d/gearmand
change the line:
CMD="$DAEMON -p $PORT -P $PIDFILE $OPTIONS –log-file=$LOGFILE –verbose=2 –listen=$LISTEN"
CMD="$DAEMON -p $PORT -P $PIDFILE $OPTIONS –log-file=$LOGFILE –verbose –listen=$LISTEN"
7. Now we start gearmand:
#> /etc/init.d/gearmand start
8. Create the mod_gearman logfile with the appropriate permissions, else the mod_gearman init script will fail:
#> touch /usr/local/var/log/mod_gearman/mod_gearman_worker.log
#> chmod 666 /usr/local/var/log/mod_gearman/mod_gearman_worker.log
9. Start the mod_gearman_worker:
#> /etc/init.d/mod_gearman_worker start
10. Finally, we come to integrating the new module with nagios. With centreon this is very simple
Using the centreon menus navigate to:
configuration > nagios > nagios.cfg > Data tab > + add a new broker module
In the dialog box enter the following:
Then hit save and restart nagios .
Make sure that the module initialised successfully by using the below grep command. This example shows a successful startup:
#> grep -i gearman /usr/local/nagios/var/nagios.log
 mod_gearman: initialized version 1.2.0 (libgearman 0.14)
 Event broker module ‘/usr/local/lib/mod_gearman/mod_gearman.o’ initialized successfully.
Also, you can use the inbuilt gearman_top command to make sure jobs are running successfully:
Once implemented, we didn’t see any particular decrease in load (maybe it’s a bit too early to tell – I will update this post with more info if it does change). However, the best improvement came in the service latency times. From 5 seconds this was slashed to under 2 seconds:
Coupled with the above, you can also use mod_gearman as a check_ncsa replacement, so we will definitely be keeping this solution around.