Troubleshooting Centreon graphs

Symptom: Centreon stopped graphing performance data completely.

There are quite a large number of reasons why this would happen, in fact a quick google search will come up with some very good articles, the few which I found useful were:

http://en.doc.centreon.com/Setup:Graphs#Perfdata_activation_in_Nagios

http://en.doc.centreon.com/Troubleshooting:Graphs

http://felipeferreira.net/?p=1019

However, alas none of the points mentioned in the above articles worked to bring back my graphs. During the course of troubleshooting, we noted the following points:

  • Graphs stopped at almost exactly 2am (suspicion immediately fell on some sort of scheduled / cron job, since they normally run between 2 – 4am)
  • The .RRD files in the default path where centreon stores it’s metrics (/var/lib/centreon/metrics) had all stopped being updated at 2am

However, the program that creates those RRD files was still responding properly (/usr/bin/rrdtool).

For those lacking some background, the centreon procedure for graphing metrics follows this flow (for a full size image please click on the image itself):

 

centreon_graphing

Following the above flow, one thing we noticed in our case was that /usr/local/nagios/var/service-perfdata was quite big (150MB). This was unusual because centstorage usually reads this file about every 5 minutes, and when doing so, empties service-perfdata into service-perfdata_read, so the former file should never be too big.

This was pointing towards a centcore issue. After checking the logs under /usr/local/centreon/log/ we noticed the following entry at exactly the same time that our problem started occurring:

Waiting for centstorage to exit .. done.
17/2/2012 02:00:13 – Begin centstorage.data_bin purge

17/2/2012 02:02:00 – Finishing centstorage.data_bin purge

Starting centstorage Collector : centstorage

Checking out the script that logs the above (centreonPurge.sh), it seems that it created a backup file of the performance data (service-perfdata.bckp). In turn, it seems that the centcore script will not read the service perfdata if the bckp file is present

 

So the solution was to rerun centPurge.sh and delete the file under /usr/local/nagios/var/service-perfdata.bckp. Once this file was deleted, centreon again started processing it’s graph data.

Advertisements

One thought on “Troubleshooting Centreon graphs

  1. Your diagram is very enlightening but I wonder if you could expand this picture to show the case for the distributed Nagios poller configuration. That is, in the upper left corner the Nagios that runs the service check is a remote poller, while Centreon resides on a separate host computer.

    In this case the data flow would involve the transfer of data via the ndomod=>ndo2db API interface, and then subsequently be inserted into the Centstatus SQL DB table named servicechecks. What I have not been able to trace is how the data gets out of this DB table
    and back into the application processing stream after ndo2db has inserted it there?

    Does the central Nagios server then input data from the DB table or is there some
    Centreon component that does this? Is there a cron job or other daemon that processes the Centstatus historical tables? If you have any insights to share I would really appreciate it!

    Fabulous web site!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s