David Vassallo's Blog

If at first you don't succeed; call it version 1.0

AlienVault: Monitoring individual sensor Events Per Second [EPS]

In a distributed AlienVault environment, it is important to be able to monitor individual sensor’s output. In our case, the requirements was to:

  • Monitor each sensor’s generated events over a configurable interval
  • If the number of generated events of a sensor goes below a configured threshold, then notify the user via email

There are several sensor monitoring options built right into alienvault, including monitoring the /var/log/alienvault/agent/agent.log which contains “EPS” information. However, in this case we use the database to calculate, for each sensor, the number of generated events. We are not interested in exactly which alienvault plugin generated the event, just the global number of events a particular sensor has generated.

The custom script can be used without installing any pre-requisites on a central SIEM server which sensors feed information into. The script (EPS_Script.py) depends on the following two configuration file:

  • /etc/ossim/ossim_setup.conf: this file already exists in a default AlienVault installation and should not be changed. EPS_Script.py uses this only to lookup the database settings
  • /etc/ossim/eps_monitor.conf: this file must be created and is used to store specific settings for the EPS_Script. A sample eps_monitor.conf script can be found below:

The config file is simple. It contains SMTP server details which are used to send emails, and default settings like the interval over which to query the number of events, and the default threshold (number of events). If the number of events generated is below the threshold, the user gets alerted. In the [thresholds] section a user can also define thresholds for individual sensors, enabling further flexibility.

The actual EPS_Script.py code is shown below:

The script starts off by parsing it’s configuration files. Next, the script retrieves a list of configured sensors from the sensor table in the alievault database. The ID of these sensors is stored in binary form, therefore we convert this into hex. The main work of the script is performed in the for loop starting at line 45. Each sensor can have multiple “devices” bound to it. From what we can tell, these devices correspond roughly to each OSSEC agent installed on the network (Environment > Detection > HIDS > Agents)

Selection_137

Note : In light of the above, the script may only be monitoring events generated by OSSEC agents. Further testing is required to confirm / deny this

For each sensor retrieved from the previous query, the script retrieves the list of associated “devices” or agents. In a distributed environment with multiple sensors, different agents/devices can point to different sensors so this step is important to “map” agents to the appropriate sensors else the EPS count will be skewed.

The script then counts all entries made by devices mapped to a specific sensor in the specified time interval and if below the specified threshold, sends and email to the appropriate recipients

Selection_136

All that’s left is to run the script periodically in a cron job. Normally the cron job interval should be the same as the one specified in the EPS monitoring script configuration file

Bringing reliability to OSSEC

As we saw in a previous blog post, OSSEC is UDP based. This is great for performance, and can scale to 1000s of nodes. However, it means there is an inherent problem of reliability. UDP is a connection-less protocol, hence the OSSEC agent has no guaranteed way of knowing that a particular event has been delivered to the OSSEC server. Instead, the architecture relies on heartbeats and keepalives. However, there is still a potential for lost events no matter how short the interval between keepalives. In this article we explore a simple python based broker solution that introduces some (but not complete) reliability into the OSSEC architecture, at the cost of performance.

The first requirement of the broker solution is that it absolutely does not touch any existing code from the current OSSEC solution. It must interfere as little as possible with the current solution, so that if there any updates or changes in OSSEC the broker can either continue to work as normal, or at least be removed and allow OSSEC to work as originally intended. To achieve this, the broker is also going to be split into two components: a TCP server which is installed on the same machine as the OSSEC server, and a proxy-like solution which is installed on the same machine as the OSSEC client.

The general idea is that the OSSEC client is configured to send it’s traffic to 127.0.0.1 rather than directly to the server. The broker client intercepts the UDP packets (which are kept encrypted and compressed, maintaining end to end security), and before sending them on to the OSSEC server, it checks via TCP (reliably) if the broker server is still reachable and if the ossec-remoted process is still alive. If the broker server responds, the the broker client “releases” the packets and forwards them on to the original OSSEC server. If no answer is received from the broker server, the broker client assumes the server is down and buffers the original UDP packets into a queue. After a while, the OSSEC agent will realise the server is down and pause operations (other than keepalives) When the server comes back online the broker client replays back all the packets that have been buffered, so no events would be lost. The general architecture is as follows:

 

Proposed OSSEC Broker architecture

Proposed OSSEC Broker architecture

 

Starting from the client, we have the following code, commented so one can follow along:

 

 

The server is significantly simpler, shown below:

Kicking the tires and testing

We use the same troubleshooting and techniques we used in the previous blog post.

First we setup the server, which is also quite straightforward. We just run the ossec_broker_server.py file, and of course ensure that the ossec process is actually running properly. Next, the client. We start off by starting the python client on the windows machine (assuming python is installed), and pointing the OSSEC agent to 127.0.0.1:

Selection_085

We immediately see some output on the ossec broker client, something like so:

 

Selection_086

 

We should also check the OSSEC agent logs to make sure it connected successfully to 127.0.0.1:

Selection_087

So far so good… we have communication between the OSSEC agent and the OSSEC server, through the broker. Now, time to test a network interruption. If we simply stop the ossec broker server (simulating such an interruption), we should see the OSSEC agent fail to keep communicating with the OSSEC server:

 

Selection_088

Now, during this interruption (but before the agent keepalives force a lock on the event viewer, so within a minute in default installs…) we generate some events:

Selection_089

These events would normally be lost, because the agent has not yet had time to realise there is a disconnection. So we now turn the server back on, and check the OSSEC archive logs to check if the above events were delivered anyways:

Selection_090

Success! :) There are some improvements to be made, but the principle is sound, if one can look past the added overhead introduced to accommodate reliability.

Follow

Get every new post delivered to your Inbox.

Join 194 other followers