OpsView, bash and GNUplot

Update: The method I go on to describe is over complicated for the objective. Matt in the comments section pointed out a much easier, OpsView method of doing this. Still, the below is a good exersize in bash / gnuPlot scripting. So, OpsView has away around it’s own limitation of not graphing host checks. Thanks Matt

I recently had the opportunity to setup and use a brilliant network monitoring and reporting tool called OpsView. I’ve blogged before about nagios and cacti, I usually use a combination of the two for my monitoring needs. However, OpsViews offers an optimized nagios core and very good reporting features, bundled with some other modules like MRTG.

On the surface the first attraction is the fact that almost everything can be controlled via the WebUI. Nagios can be a bit clunky with having to edit the template files and so on. OpsView makes it a bit easier to manage. I still had to dig into the CLI every so often, but it’s still a massive improvement.

I’m currently testing it out with their ready made virtual appliance, so installation was an absolute breeze. Even setting up basic monitoring and reporting was very easy. My main notes to make here was that we wanted to monitor the latency time of a particular HTTPS service, i.e. the time taken for the HTTPS server to answer a syn packet. OpsView can monitor this in fact, as a “host check” command.

However, turns out that only “monitors” can be graphed by OpsView, “host checks” cannot. Which was pity because the host check was reporting exactly what we needed:

TCP OK – 0.007 second response time on port 443

 

So I needed a way of automatically graphing that time value (0.007). Here’s how I did it roughly using a bash script and a GNUplot script. For those of you who are not familiar with it, GNUplot is the linux de-facto program for graphing. One other note, is that what tied all this together so well is the OpsView API, which is brilliant since it allows one to access it over HTTP. (PS, there definitely are more elegant and simple ways of doing this, its just an exercise in scripting and extension)

  1. Retrieve the host check information via the OpsView API
  2. I used a normal WGET command for this, and some formatting from the OpsView API documentation:

wget –header=”Content-Type: text/xml” –header=”X-Username: admin” –header=”X-password: initial” http://10.91.2.38/api/status/service

Note the inclusion of three headers:

  • X-Username, X-password: used for authentication
  • Content-Type: make sure to include this else the returned data is not formatted at all which makes it difficult to parse

Once the data was retrieved, it comes in XML format. I then used awk to “cut out” the value that I needed:

awk -F ” ” ‘/BCWF/ {print $16}’ service

Breaking down the above

  • -F “ “ , sets the delimiter to a space
  • /BCWF/ looks for the line containing the string “BCWF”
  • print $16, prints the 16th word in the line
  • service is the filename downloaded from wget

So at a basic level we now have a way of recording the value of our host check. To tie everything together and include a timestamp, here’s the final, quick bash script:

#record the date in time format HH:MM:SS, and store in variable ‘a’

a=`date +%T`

#wget command
wget –header=”Content-Type: text/xml” –header=”X-Username: admin” –header=”X-password: initial” http://10.91.2.38/api/status/service

#AWK command, and record the value to a variable ‘b’

awk -F ” ” ‘/BCWF/ {print $16}’ service > value.txt
b=`cat value.txt`

#store the results in CSV format to a text file

echo $a,$b >>/tmp/results.txt

#remove the temporary files to save disk space

rm serv*
rm value.txt

#graph the data

/tmp/test1.pg > /tmp/test.png

The actual results.txt looks something like:

12:07:27,0.007
12:07:32,0.007
12:10:02,0.007
12:15:01,0.007
12:20:01,0.010
12:25:01,0.007
12:30:01,0.006
12:35:01,0.006
12:40:01,0.007
12:45:02,0.007
12:50:01,0.007
12:55:01,0.007
13:00:01,0.006
13:05:01,0.014

On the left is the time, on the right is the latency of the service

2. Graph the data

Here I needed a separate gnu script which I called test1.pg:

#!/usr/bin/gnuplot
reset
set terminal png
set datafile separator “,”

set xdata time
set timefmt “%H:%M:%S”
set format x “%H:%M”
set xlabel “time”

set ylabel “total latency”

set title “BCWF Latency Time”
set key reverse Left outside
set grid

set style data linespoints

plot “/tmp/results.txt” using 1:2 title “latency”

!!Important to note the she-bang on the first line pointing to the GNU interpreter!

  • set terminal png : output should be PNG formatted file
  • set datafile separator “,”  : allows us to use the CSV file I created (results.txt)
  • set xdata time, set timefmt “%H:%M:%S”, set format x “%H:%M”, set xlabel “time” : define the format of the X-AXIS
  • set ylabel “total latency”  : set the format of the Y axis
  • set style data linespoints : set format of the graph line
  • plot “/tmp/results.txt” using 1:2 title “latency” : where /tmp/results.txt is the input data file to be graphed, and “using 1:2” means using the first element before the comma as X-value, and the second element of the file after the comma as the Y-value (in other words plot time vs latency)

The resulting picture:

test

Not fancy, but automated and good enough!🙂🙂