David Vassallo's Blog

If at first you don't succeed; call it version 1.0

Category Archives: Open Source

AlienVault: Monitoring individual sensor Events Per Second [EPS]

In a distributed AlienVault environment, it is important to be able to monitor individual sensor’s output. In our case, the requirements was to:

  • Monitor each sensor’s generated events over a configurable interval
  • If the number of generated events of a sensor goes below a configured threshold, then notify the user via email

There are several sensor monitoring options built right into alienvault, including monitoring the /var/log/alienvault/agent/agent.log which contains “EPS” information. However, in this case we use the database to calculate, for each sensor, the number of generated events. We are not interested in exactly which alienvault plugin generated the event, just the global number of events a particular sensor has generated.

The custom script can be used without installing any pre-requisites on a central SIEM server which sensors feed information into. The script (EPS_Script.py) depends on the following two configuration file:

  • /etc/ossim/ossim_setup.conf: this file already exists in a default AlienVault installation and should not be changed. EPS_Script.py uses this only to lookup the database settings
  • /etc/ossim/eps_monitor.conf: this file must be created and is used to store specific settings for the EPS_Script. A sample eps_monitor.conf script can be found below:

The config file is simple. It contains SMTP server details which are used to send emails, and default settings like the interval over which to query the number of events, and the default threshold (number of events). If the number of events generated is below the threshold, the user gets alerted. In the [thresholds] section a user can also define thresholds for individual sensors, enabling further flexibility.

The actual EPS_Script.py code is shown below:

The script starts off by parsing it’s configuration files. Next, the script retrieves a list of configured sensors from the sensor table in the alievault database. The ID of these sensors is stored in binary form, therefore we convert this into hex. The main work of the script is performed in the for loop starting at line 45. Each sensor can have multiple “devices” bound to it. From what we can tell, these devices correspond roughly to each OSSEC agent installed on the network (Environment > Detection > HIDS > Agents)

Selection_137

Note : In light of the above, the script may only be monitoring events generated by OSSEC agents. Further testing is required to confirm / deny this

For each sensor retrieved from the previous query, the script retrieves the list of associated “devices” or agents. In a distributed environment with multiple sensors, different agents/devices can point to different sensors so this step is important to “map” agents to the appropriate sensors else the EPS count will be skewed.

The script then counts all entries made by devices mapped to a specific sensor in the specified time interval and if below the specified threshold, sends and email to the appropriate recipients

Selection_136

All that’s left is to run the script periodically in a cron job. Normally the cron job interval should be the same as the one specified in the EPS monitoring script configuration file

OSSEC event loss troubleshooting

There is a general consensus that OSSEC will lose events in the event that the main OSSEC server goes offline for whatever reason ( [1] , [2] ) – be it the service is stopped, a network disconnection, or anything in between. However, there doesn’t seem to be much information on when exactly even loss can occur, for how long, and how the OSSEC agent recovers. In this article we explore some troubleshooting steps taken in order to answer the these questions.

Test Environment:

  • Windows Server 2012, installed as a VMWare Guest.

  • VMWare ESX v5

  • Alienvault All-in-one (commercial, not OSSIM) acting as OSSEC server

The OSSEC agent was deployed to the windows server using the AlienVault GUI, and the agent confirmed to be active:

1

The OSSEC server was placed in debug logging mode (using the <logall> global directive [3] ) . Next, we tested if application event logs were being sent from the agent to the server. On the client, which is the windows server 2012 VM, we do the following:

2

The above screenshot shows using the “eventcreate” command to generate two application logs:

before disconnecting 1

before disconnecting 2

We then confirmed that these were being received by the OSSEC server as expected:

3

Of note in the screenshot above is that the “logall” directive of OSSIM logs to the /var/ossec/logs/archives/archives.log path.

Now we disconnect the NIC card from the VM as shown below:

4

This simulated a general network failure, as shown by the ping results below. Immediately after, we generate a further three application logs (after disconnecting 1, after disconnecting 2, after disconnecting 3):

5

These events never make it through to the OSSEC server. However, if we wait for 240 seconds – the configured timeout value – we now see in the OSSEC agent logs:

6

So now we generate a further three more application events:

8

these being after 240 seconds 1, after 240 seconds 2 and after 204 seconds 3. For now, these still do not appear in the OSSEC server since the NIC card is still disconnected.

We proceed to enable the NIC card and connecting it again as shown before. We also generate some application events, after reconnect 1, after reconnect 2, after reconnect 3, and monitor the OSSEC server logs, and we see the logs being sent to the server after a few seconds, correctly timestamped, with no intervention or restarting from our end:

7

So in conclusion, caching does occur, but only if one waits for the 240 seconds to elapse. Events during those 240 seconds will be lost. One should note that this 240 seconds is configurable from the OSSEC agent via the <time-reconnect> directive, shown below:

9

One can of course reduce the values of notify_time and time_reconnect. In the source code for the OSSEC agent, the default time_reconnect value is set to three times (3x) the notify_time value, which is a sane default in my opinion. There’s a trade-off here between performance and reliability, since OSSEC is UDP based, the only way of knowing if the OSSEC server is offline is for the agent to send periodic keepalives (notifications). Setting notify_time too low means more network traffic and processing on the OSSEC agents and server, but less logs lost in the event of a disconnection.

So unless OSSEC gets converted to TCP (at the cost of performance due to TCP overhead compared to UDP) there seems there will always be the possibility of log loss.

References

[1] https://groups.google.com/forum/#!topic/ossec-list/F_izIq3zEi4

[2] https://groups.google.com/forum/#!topic/ossec-list/mQr3L_sqJ-Q

[3] http://ossec-docs.readthedocs.org/en/latest/syntax/head_ossec_config.global.html#element-logall

Follow

Get every new post delivered to your Inbox.

Join 186 other followers