We’re big fans of NiFi at CyberSift – we use it as our primary platform to ingest data from a wide variety of sources, process the data and POST it to an Elasticsearch back-end. During our time spent with NiFi, we built a basic, but useful load-balancing processor:
The readme is hopefully quite clear, but in a nutshell – the processor allows you to define multiple downstream destinations where you’d like to send flowfiles to.
Each destination is checked every 5 seconds by running a user-specified command. If the command returns an exit code of 0 – the destination is assumed to be “alive” and therefore accepting flow files. If the command returns a non-zero exit code the destination is considered to be “dead” and flow files will no longer be routed to said destination
The processor currently allows for three different load-balancing strategies:
- Attribute hash – this strategy will ensure “stickyness” – i.e. all flow files containing an attribute of the same value will be sent to the same destination; useful for some scenarios e.g. sending all flow files from a particular user to the same destination
We also prepared a couple of youtube videos to give you a flavor of how the processor works: