What do Smartphone Predictive Text and Cybersecurity have in common?

Maybe the link between your smartphone keyboard and current machine learning research in cybersecurity is not apparent at first glance, but the technology behind both is extremely similar: both leverage deep learning architectures called Recurrent Neural Networks [RNNs], specifically a type of RNN called Long Short Term Memory [LSTM].

One of the main advantages of LSTMs is their ability to deal with sequences very well. Due to the composition of the building blocks of LSTMs, these RNNs are able to predict the next step in a sequence given previous steps by taking into account not only the statistical properties of a sequence in question (e.g. frequency) but also the temporal properties of a sequence. To give a practical example of “temporal properties”, let’s imagine a simplistic example. Say an LSTM has been trained with sequences similar to the following:

previous steps -> next step

“1 1 1” -> 2

“4 4 4” -> 5

Given the never-before-seen sequence of “8 8 8” the LSTM is very well able to predict “9” correctly. This may seem simplistic but a neural network typically deals with thousands or millions of different sequences, but the LSTM is anyway capable of learning the intuitive rule in our example that if you see three repeated numbers, the next number is simply +1. This is different from spatial or frequency based machine learning techniques (such as One Class SVMs) where a never-before-seen sequence gets classified as an anomaly — precisely because it’s never been seen before.

Your smartphone keyboard is actually powered by deep learning

You probably use LSTMs every day without realizing it — in the form of the predictive text suggestions that appear whenever you are typing something in your smartphone. As we just explained, LSTMs are very good with sequences. Sequences can just as well be letters rather than numbers. So given enough training, given a previous sequence of letters, an LSTM gets very good at suggesting the next letter, couple of letters, or the whole word.

The screenshot above is familiar to all of you… start typing and given a sequence of characters, the LSTM will predict the most probable next few characters. These “predictions” are what we call suggestions.



Where things get interesting for cybersecurity analysts is what happens when we feed an LSTM a sequence of characters which are abnormal.



An example of doing this on your smartphone is shown above. When we feed the LSTM an abnormal sequence of characters, it cannot predict with any certainty what the next character is. This manifests itself in very limited suggestions. In the screenshot, note how the keyboard suggestions are limited to the sequence itself (LSTM could not predict the next character, or it simply prepends common characters).


The cybersecurity tie-in

One man’s trash is another man’s gold. While the above might not seem very useful to the smartphone user — it is to a cybersecurity analyst who is looking for anomalies within the millions of logs that are generated by security devices.

For example, let’s consider CyberSift’s Docker anomaly detection engine. The concept is pretty simple: detect anomalous sequences of system calls. Any operating system’s activity can be characterized as a stream of system calls like so:


We can imagine each system call as being a character or number in a longer sequence — exactly what LSTM is designed to handle. To give a practical example, let’s imagine we are using an LSTM that has been trained on common sequences of system calls. Next, we see how the LSTM reacts when we ask it to predict the next system call, given a sequence of syscalls which is relatively common. The LSTM output could look similar to this:



The above graph shows that the LSTM is 90% certain that the next syscall is going to be “open”. Similar to what we saw before with the smartphone keyboard, the LSTM network has a good chance of being correct.

Contrast this to what happens when we feed the LSTM network an unusual syscall sequence. Just like before, the LSTM network will get confused and give very uncertain predictions:


The above graph still shows “open” as being the next most probable system call, but the network is a lot less certain about it (16% vs the 90% we had previously)



This is exactly how CyberSift leverages deep learning to help detect anomalies in your docker environment — or to detect anomalies within your logs, highlighting those sequences that are different or unusual and therefore are more worthy of your limited time.

These types of protections are becoming increasingly important as novel attacks are discovered against docker and other systems which do not necessarily trigger signatures, but definitely generate anomalous behavior.


Source: https://threatpost.com/attack-uses-docker-containers-to-hide-persist-plant-malware/126992/


Consider the attack presented in Black Hat just last month — where hackers were able to spin up a docker container just by having a target visiting a specially crafted webpage. Their attack consists of leveraging the docker API to start a container and then use that to laterally attack the network. In a busy docker environment, where containers are being started and stopped multiple times within a short period of time, keeping your eye on all the containers being started may be a bit too much to handle, but as we can see from CyberSift’s anomaly detection engine output below — starting a container that performs unusual actions shows up as a highly anomalous period:


Note the significantly higher anomaly score for the time period where a docker container was spun up and performed a range of lateral attacks and data exfiltration. For further information about the test environment used to capture the above results, please have a quick read here


For more posts like this, written in my capacity as CTO of CyberSift, please follow us on Medium! We include more technical, marketing, and management articles all relating to InfoSec



The importance of data mining in the field of cybersecurity

In a very interesting article on TechCrunch, Michael Schiebel writes about the various ways in which security analysts can learn from data scientists. He makes a couple of points that are worth highlighting.

Today, hacking is a much more complex art than it used to be: It no longer only involves just scanning and penetrating the network via a vulnerability. Yet the traditional security tools used by most companies are often inadequate because they still focus on this

As any security professional can attest to, hacking nowadays has become easier than ever. Just a few years ago, script kiddies were relegated to using the venerable Nmap and brute force programs like THC Hydra. Nowadays it’s a different story. There are a plethora of highly sophisticated (and effective) exploit tools such as Metasploit, the Social Engineering Toolkit and Powershell Empire. These tools are easy to learn, easy to extend, and excellent at what they do. Not only that — most of the tools are free and open source. At any stage of the attack lifecycle hackers can find amazing tools to help them do their job.

Yet we as cybersecurity vendors are lagging behind especially when it comes to tool-sets. As Michael states:

Most tools are still role-based, with signatures, detection and response rules. That’s their downfall.

Again, we couldn’t agree more. Signature based tools still play an important part in cyber defense, but the defense-in-depth principle requires us to deploy tools which can mitigate those threats which pass through our outer rings of defense. Luckily, cyber defense tools are evolving, with the help of open-source innovation in both security and big data fields.

Focus on the abnormalities

This is what it’s all about. Effectively finding abnormalities in your network has a couple of very important benefits to your organization:

  • It forces you to be more aware of your networks and systems. You are required to investigate abnormalities and effectively determine if an abnormality is expected or malicious. The more aware you are of your environment, the less time it takes you to realize when something goes horribly wrong (like in the event of a hack…)
  • With the proliferation of advanced attack vectors (like the steganographic attacks I recently wrote about) and cloud computing, it’s very easy for hackers to use legitimate services to carry out their attacks in such a way as to avoid tripping signature based alarms. Signatures that target AWS or Twitter would be triggered so many times that they would be ignored, even though they are potential avenues of attack already being exploited by hackers. Abnormality detection systems can flag connections which use these services in weird ways (too much data being transferred, too many connections being done, periodic connections to previously unused endpoints, and so on…)

At this stage it’s important to note that abnormalities do not automatically mean malicious activity… an anomaly based system highlights those events that deviate from the norm. There are several examples of genuine anomalies which are not malicious:

  • Marketing executes a successful campaign resulting in a flood of connections to your webservers
  • A misconfiguration is introduced during one of your changes to a backup system which causes high volume traffic to flow through the wrong network path
  • Your organization engages with customers in new markets, leading to your network having new traffic patterns to previously non-contacted countries and Autonomous Systems

These are practical examples of how an anomaly based system increases your team’s awareness of the environment. This leads me to prefer referring to anomaly based systems as cyber-awareness platforms rather than simple “cyber-defense”.

The real problem in most organizations is that too much security alert data is coming in too fast.

Michael again hit the nail on the head here. If your security analysts are investigating too much data, then no wonder we’re seeing alarming headlines such as:

Most companies take over six months to detect data breaches (by ZDNet)

Anomaly based IDS help your analysts focus on those alarms that can be important, reducing their mitigation time and efficiency — and at the end of the day this is what translates to cost savings for the organization

Here at CyberSift we are building next generation anomaly detection systems which are based on the above principles and add an effective layer of defense which counters new threats as they emerge without the need of signatures or rules, all the while increasing your team’s cyber-awareness of their systems and networks. Stay tuned for exciting developments…

Read the full article “What your security scientists can learn from your data scientists to improve cybersecurity” here.