What do Smartphone Predictive Text and Cybersecurity have in common?

Maybe the link between your smartphone keyboard and current machine learning research in cybersecurity is not apparent at first glance, but the technology behind both is extremely similar: both leverage deep learning architectures called Recurrent Neural Networks [RNNs], specifically a type of RNN called Long Short Term Memory [LSTM].

One of the main advantages of LSTMs is their ability to deal with sequences very well. Due to the composition of the building blocks of LSTMs, these RNNs are able to predict the next step in a sequence given previous steps by taking into account not only the statistical properties of a sequence in question (e.g. frequency) but also the temporal properties of a sequence. To give a practical example of “temporal properties”, let’s imagine a simplistic example. Say an LSTM has been trained with sequences similar to the following:

previous steps -> next step

“1 1 1” -> 2

“4 4 4” -> 5

Given the never-before-seen sequence of “8 8 8” the LSTM is very well able to predict “9” correctly. This may seem simplistic but a neural network typically deals with thousands or millions of different sequences, but the LSTM is anyway capable of learning the intuitive rule in our example that if you see three repeated numbers, the next number is simply +1. This is different from spatial or frequency based machine learning techniques (such as One Class SVMs) where a never-before-seen sequence gets classified as an anomaly — precisely because it’s never been seen before.

Your smartphone keyboard is actually powered by deep learning

You probably use LSTMs every day without realizing it — in the form of the predictive text suggestions that appear whenever you are typing something in your smartphone. As we just explained, LSTMs are very good with sequences. Sequences can just as well be letters rather than numbers. So given enough training, given a previous sequence of letters, an LSTM gets very good at suggesting the next letter, couple of letters, or the whole word.

The screenshot above is familiar to all of you… start typing and given a sequence of characters, the LSTM will predict the most probable next few characters. These “predictions” are what we call suggestions.



Where things get interesting for cybersecurity analysts is what happens when we feed an LSTM a sequence of characters which are abnormal.



An example of doing this on your smartphone is shown above. When we feed the LSTM an abnormal sequence of characters, it cannot predict with any certainty what the next character is. This manifests itself in very limited suggestions. In the screenshot, note how the keyboard suggestions are limited to the sequence itself (LSTM could not predict the next character, or it simply prepends common characters).


The cybersecurity tie-in

One man’s trash is another man’s gold. While the above might not seem very useful to the smartphone user — it is to a cybersecurity analyst who is looking for anomalies within the millions of logs that are generated by security devices.

For example, let’s consider CyberSift’s Docker anomaly detection engine. The concept is pretty simple: detect anomalous sequences of system calls. Any operating system’s activity can be characterized as a stream of system calls like so:


We can imagine each system call as being a character or number in a longer sequence — exactly what LSTM is designed to handle. To give a practical example, let’s imagine we are using an LSTM that has been trained on common sequences of system calls. Next, we see how the LSTM reacts when we ask it to predict the next system call, given a sequence of syscalls which is relatively common. The LSTM output could look similar to this:



The above graph shows that the LSTM is 90% certain that the next syscall is going to be “open”. Similar to what we saw before with the smartphone keyboard, the LSTM network has a good chance of being correct.

Contrast this to what happens when we feed the LSTM network an unusual syscall sequence. Just like before, the LSTM network will get confused and give very uncertain predictions:


The above graph still shows “open” as being the next most probable system call, but the network is a lot less certain about it (16% vs the 90% we had previously)



This is exactly how CyberSift leverages deep learning to help detect anomalies in your docker environment — or to detect anomalies within your logs, highlighting those sequences that are different or unusual and therefore are more worthy of your limited time.

These types of protections are becoming increasingly important as novel attacks are discovered against docker and other systems which do not necessarily trigger signatures, but definitely generate anomalous behavior.


Source: https://threatpost.com/attack-uses-docker-containers-to-hide-persist-plant-malware/126992/


Consider the attack presented in Black Hat just last month — where hackers were able to spin up a docker container just by having a target visiting a specially crafted webpage. Their attack consists of leveraging the docker API to start a container and then use that to laterally attack the network. In a busy docker environment, where containers are being started and stopped multiple times within a short period of time, keeping your eye on all the containers being started may be a bit too much to handle, but as we can see from CyberSift’s anomaly detection engine output below — starting a container that performs unusual actions shows up as a highly anomalous period:


Note the significantly higher anomaly score for the time period where a docker container was spun up and performed a range of lateral attacks and data exfiltration. For further information about the test environment used to capture the above results, please have a quick read here


For more posts like this, written in my capacity as CTO of CyberSift, please follow us on Medium! We include more technical, marketing, and management articles all relating to InfoSec



Threat hunting using DNS indicators

DNS is a great source of information for security analysts… if you’re not already monitoring DNS activity in your network — you should start asap, for the reasons we’ll explore in this article

DNS is one of the major workhorses that powers the Internet. Everything uses DNS — browsers, apps, updates… and malware. Almost every malware needs to “phone home” to receive instructions, exfiltrate data or otherwise communicate with attackers. Malware authors utilize a variety of DNS tricks to control the malware they spread, such as:

  • Domain Generation Algorithms
  • Fast Flux Domains
  • DNS Covert Channels
  • Familiar or misspelt domains

Domain Generation Algorithms

Once malware is installed on a target system, it usually needs to communicate back to its C&C server for instructions. Hardcoding a domain into the malware would make the malware very short lived… malicious activity is banned on many hosting providers who are quick to revoke DNS records involved in such activity. So, authors program malware using Domain Generation Algorithms [DGAs] to generate algorithms on the fly, usually by using a combination of the current date. This increases the chances of malware being able to communicate with a DNS domain that is still active — increasing it’s useful lifespan. Wikipedia has a very interesting and easy to read article on DGA, including examples from CornFlicker, CryptoLocker, and others.

Typically these domains are quite random in nature, for example “intgmxdeadnxuyla” and “axwscwsslmiagfah”. That makes it quite easy for a human to pick out. The downside is that there are way too many DNS records to check manually, and automated malware domain lists don’t usually include DGA generated domains, there are simply too many and they change every day


CyberSift helps out by performing a number of language structure checks on visited domains. In the example below we investigate a DGA generated domain that has quite a high abnormal score of 22.853 (anything above a 10.0 warrants some investigation). As you can see in the highlighted portion of our output, a model named “Score” contributed quite a large number of points (13.515) to the abnormal score. The score model uses statistical analysis to determine how likely a domain has a given structure, and how often it is used. Since DGAs output random domains that dont usually look anything like proper language, they tend to trigger this model

CyberSift DGA Detection

Fast Flux Domains

Fast flux domains are a tried and tested malware and phishing technique. In it’s simplest form, malware authors register hundreds (or thousands) of IP address for a given domain. The DNS records are given a short time to live, so that infected victims connect to different IP addresses for any given malware domain, reducing the chances of an IP address being blocked. Sometimes, fast flux domains are used in combination with DGA to further increase the chances of infected machines communicating back to their handlers.

Some advertising networks (usually shady ones…) tend to use fast-flux domains, which return a high number of IP addresses. In our example below we see one such advertising network, returning an abnormally high number of IP addresses, and contributing quite a large chunk to the anomaly score

DNS Covert Channels

DNS covert channels are particularly interesting. We mentioned that practically any program needing to connect to the Internet requires DNS to function properly, so two things are pretty much guaranteed:

  1. Outbound DNS requests are not blocked
  2. Since DNS proxies are not as prolific as HTTP/S proxies, DNS traffic is very probably not monitored

This creates a perfect medium for attackers to use… by tunneling data through DNS traffic. As recently as last month (March 2017), security researches have observed malware leveraging this concept to hide their communication traffic in plain sight:

Covert Channels and Poor Decisions: The Tale of DNSMessenger

DNSMessenger used TXT records to create a covert channel, that would be difficult to detect. Looking at CyberSift’s output during testing, we see an alert that there was anomalous activity in our environment’s DNS activity:

CyberSift detecting anomalous DNS activity

CyberSift uses intelligent clustering techniques to detect anomalous behavior when it comes to the amount of DNS queries being sent by your environment, to detect issues such as mis-configuration, or — as in the example above — covert malicious activity

This makes threat-hunting quite easy. A sample security threat hunter’s workflow using CyberSift would be something as follows:

CyberSift vs DNSMessenger Summary

Familiar or misspelt domains

You’ve definitely seen this age-old but extremely popular technique from scam and phishing campaigns. Attackers love to use familiar looking domains such as “service-portal-paypal.com” (or “papyal.com”) and similar domains which most analysts wouldn’t give a second glance to, since they’d assume it’s a legitimate service. This has become quite a problem especially with free SSL certificate authorities:

An example of this technique in use in the wild can be seen in the “OilRig” campaign:

OilRig Campaign Analysis from AlienVault OTX: https://otx.alienvault.com/pulse/58de329c88c71500d0e660b8/

From the section “Indicator of Compromise” in the above screenshot, we see a domain which certainly looks familiar: main-google-resolver.com. An analyst is certainly forgiven if they don’t investigate this domain more thoroughly — it does after all look like something Google would really use.

But a closer look with CyberSift reveals a different story:

Again, the “Score” model comes to the rescue here. The model realizes that the word “google” is usually not seen in this context, and adds just enough point to make this domain anomalous (remember anything over 10.0 is considered worth investigating). Over and above that, the site in this case did not resolve, so this caused the DNS anomaly engine to push up the anomaly score even further.

Interested in learning more about CyberSift? Have feedback or would like to try out CyberSift? Contact us for more information