Update: The method I go on to describe is over complicated for the objective. Matt in the comments section pointed out a much easier, OpsView method of doing this. Still, the below is a good exersize in bash / gnuPlot scripting. So, OpsView has away around it’s own limitation of not graphing host checks. Thanks Matt
I recently had the opportunity to setup and use a brilliant network monitoring and reporting tool called OpsView. I’ve blogged before about nagios and cacti, I usually use a combination of the two for my monitoring needs. However, OpsViews offers an optimized nagios core and very good reporting features, bundled with some other modules like MRTG.
On the surface the first attraction is the fact that almost everything can be controlled via the WebUI. Nagios can be a bit clunky with having to edit the template files and so on. OpsView makes it a bit easier to manage. I still had to dig into the CLI every so often, but it’s still a massive improvement.
I’m currently testing it out with their ready made virtual appliance, so installation was an absolute breeze. Even setting up basic monitoring and reporting was very easy. My main notes to make here was that we wanted to monitor the latency time of a particular HTTPS service, i.e. the time taken for the HTTPS server to answer a syn packet. OpsView can monitor this in fact, as a “host check” command.
However, turns out that only “monitors” can be graphed by OpsView, “host checks” cannot. Which was pity because the host check was reporting exactly what we needed:
TCP OK – 0.007 second response time on port 443
So I needed a way of automatically graphing that time value (0.007). Here’s how I did it roughly using a bash script and a GNUplot script. For those of you who are not familiar with it, GNUplot is the linux de-facto program for graphing. One other note, is that what tied all this together so well is the OpsView API, which is brilliant since it allows one to access it over HTTP. (PS, there definitely are more elegant and simple ways of doing this, its just an exercise in scripting and extension)
- Retrieve the host check information via the OpsView API
I used a normal WGET command for this, and some formatting from the OpsView API documentation:
wget –header=”Content-Type: text/xml” –header=”X-Username: admin” –header=”X-password: initial” http://10.91.2.38/api/status/service
Note the inclusion of three headers:
- X-Username, X-password: used for authentication
- Content-Type: make sure to include this else the returned data is not formatted at all which makes it difficult to parse
Once the data was retrieved, it comes in XML format. I then used awk to “cut out” the value that I needed:
awk -F ” ” ‘/BCWF/ {print $16}’ service
Breaking down the above
- -F “ “ , sets the delimiter to a space
- /BCWF/ looks for the line containing the string “BCWF”
- print $16, prints the 16th word in the line
- service is the filename downloaded from wget
So at a basic level we now have a way of recording the value of our host check. To tie everything together and include a timestamp, here’s the final, quick bash script:
#record the date in time format HH:MM:SS, and store in variable ‘a’
a=`date +%T`
#wget command
wget –header=”Content-Type: text/xml” –header=”X-Username: admin” –header=”X-password: initial” http://10.91.2.38/api/status/service#AWK command, and record the value to a variable ‘b’
awk -F ” ” ‘/BCWF/ {print $16}’ service > value.txt
b=`cat value.txt`#store the results in CSV format to a text file
echo $a,$b >>/tmp/results.txt
#remove the temporary files to save disk space
rm serv*
rm value.txt#graph the data
/tmp/test1.pg > /tmp/test.png
The actual results.txt looks something like:
12:07:27,0.007
12:07:32,0.007
12:10:02,0.007
12:15:01,0.007
12:20:01,0.010
12:25:01,0.007
12:30:01,0.006
12:35:01,0.006
12:40:01,0.007
12:45:02,0.007
12:50:01,0.007
12:55:01,0.007
13:00:01,0.006
13:05:01,0.014
On the left is the time, on the right is the latency of the service
2. Graph the data
Here I needed a separate gnu script which I called test1.pg:
#!/usr/bin/gnuplot
reset
set terminal png
set datafile separator “,”set xdata time
set timefmt “%H:%M:%S”
set format x “%H:%M”
set xlabel “time”set ylabel “total latency”
set title “BCWF Latency Time”
set key reverse Left outside
set gridset style data linespoints
plot “/tmp/results.txt” using 1:2 title “latency”
!!Important to note the she-bang on the first line pointing to the GNU interpreter!
- set terminal png : output should be PNG formatted file
- set datafile separator “,” : allows us to use the CSV file I created (results.txt)
- set xdata time, set timefmt “%H:%M:%S”, set format x “%H:%M”, set xlabel “time” : define the format of the X-AXIS
- set ylabel “total latency” : set the format of the Y axis
- set style data linespoints : set format of the graph line
- plot “/tmp/results.txt” using 1:2 title “latency” : where /tmp/results.txt is the input data file to be graphed, and “using 1:2” means using the first element before the comma as X-value, and the second element of the file after the comma as the Y-value (in other words plot time vs latency)
The resulting picture:
Not fancy, but automated and good enough! 🙂 🙂
As you are running checks against the host anyway could you not just assume the host is up and have a service check that uses the same parameters – check_tcp -H $HOSTADDRESS$ -p 443 -w 9 -c 9 -t 15
You should then get Perfdata back with this and Opsview will generate a graph based around this?
Quite right, I figured somebody would point out how over complicated my solution was.
Thanks for the heads up Matt. Indeed, all I needed to do was define a custom service check (configuration > service checks) and added the syntax as you described. It did return the graph exactly as needed
Cheers Matt! 🙂 I’ll update the article to show this (much easier) method
Hey Dvas – the article is good though and an interesting way to adapt the data in Opsview/Nagios in general.
In the latest OV you can export the CSV data from Opsview to create your own graphs in excel/other apps if you want to.
True we noticed that… but we wanted it automated for our lazy admins 🙂
That said now with the graphing by OpsView, it can create a link / embeddable object for the graph so everything is automated and easily accessible
Thanks for the comments Matt!
One other way of making it a bit more automated is to enable ODW and then use MySQL ODBC connection to link to excel and you can query ODW for the relevant perfdata and use it in a graph as well. I do something similar to automate a systems check document for 40 servers on a site.
Nice… never checked ODW, but I definitely will 🙂