3 uses for random decision trees / forests you (maybe) didn’t know about

Decision tree forests rightly get a lot of attention due to their robust nature, support for high dimensions and easy decipherability. The most well known uses of decision tree forests are: Classification - given a set of samples with certain features, classify the samples into discrete classes which the model has been trained on. Regression … Continue reading 3 uses for random decision trees / forests you (maybe) didn’t know about

Advertisements

Encrypting traffic in transit to Apache NiFi

In this article we'll explain how to encrypt traffic going to an HTTP handler in NiFi, which is then forwarded to a backend HTTP server - in other words, an SSL offloading reverse proxy. Encrypting traffic in transit to NiFi involves the following steps: Creating a keystore containing a CA certificateCreating a truststore, which contains … Continue reading Encrypting traffic in transit to Apache NiFi

Is it Elastalert? No – it’s NiFi!!

One common requirement for users of Elasticsearch is to have automatic alerts sent out whenever some query gets matched, or when some other condition gets satisfied. In fact, Yelp have come up with a python-based solution for this in the form of Elastalert, which at time of writing, is extremely popular with over 5.5K stars … Continue reading Is it Elastalert? No – it’s NiFi!!

Consuming Netflow using NiFi

The problem Several network devices (especially Cisco) tend to use netflow for auditing network connections. It would be useful to log these connections in a structured data store (Elasticsearch is my data store of choice). Alternative Solutions Using the elasticsearch netflow module: https://www.elastic.co/guide/en/logstash/current/netflow-module.htmlThis works well right out of the box, and supports all netflow versions. … Continue reading Consuming Netflow using NiFi

Analyzing credit card transactions using machine learning techniques – 3

Introduction In a previous article, we explored how PCA can be used to plot credit card transactions into a 2D space, and we proceeded to visually analyse the results. In this article, we take this process one step further and use hierarchical clustering to automate parts of our analysis, making it even easier for our … Continue reading Analyzing credit card transactions using machine learning techniques – 3

Analyzing credit card transactions using machine learning techniques – 2

Principal Component Analysis - Introduction and Data Preperation Principal Component Analysis [PCA] is an unsupervised algorithm which reduces dimensionality and is widely used. A good visual explanation can be found here: http://setosa.io/ev/principal-component-analysis/ As mentioned in our previous article, Correspondence Analysis  works exclusively on categorical data. In contrast, PCA accepts only numerical data. This means our data … Continue reading Analyzing credit card transactions using machine learning techniques – 2

Data mining firewall logs : Principal Component Analysis

In this article we'll explore how Principal Component Analysis [PCA] [1] - a popular data reduction technique - can help a busy security or network administrator. Any such administrator has often been faced with a daunting problem... going through reams of firewall or router connection logs trying to figure out if any of the thousands … Continue reading Data mining firewall logs : Principal Component Analysis