The security site “DarkReading” had a couple of interesting articles on the use of steganography in malware:
The idea of an stealthy malware command and control center is very intriguing. As some researchers point out in the above articles, an ideal malware command and control center can deliver instructions to malware bots around the world with minimum risk of detection. The researchers in the first article above reason that Instagram is a perfect medium for a hidden C&C. Social sites are rarely blocked (they are too popular – even in the workplace), they are encrypted via HTTPS and if one manages to hide instructions within images, then even traffic pattern anomaly detection will have a hard time detecting this type of activity
The problem is that most social network sites resize, compress or watermark images that typically destroys data hidden within an image using steganography. In this article we’ll explore a very simple proof of concept C&C that utilizes python and the popular StegHide to embed data in images, and Twitter as a medium to send the modified images.
StegHide is a popular open source steganography tool and t’s worthwhile going through StegHide’s documentation – it uses a very clever way of graph theory to make minimal changes to the original image
Without further ado, here’s the python code:
It’s a short program, with some very simple steps. We use the popular Python-Twitter API to ease communication with Twitter. After the proper setup (lines 1-24), we download any new status updates of the user ‘davevassallo6’ (lines 31-49)
The code doesnt include any error checking but in essence it downloads any media in the status updates, and saves them to a temporary file (please note the proper use of twitter.twitter_utils.parse_media_file on line 40). We then proceed to use steghide to decode our image (line 44) and print out the results.
- We use the passcode option with steghide to encrypt the data as well (the “-p” argument on line 45)
- We force the output of steghide to /dev/stdout so that python can read the results; this would need to change on windows platforms
So much for the hard part – decoding the data. Hiding C&C instructions within an image is easy, simply fire up a terminal and:
echo “Hello World. This could be valuable information (it’s not….)” > embed.txt
steghide embed -cf Cjn5w8UWYAApAkc.jpg -ef embed.txt -p helloWorld -z 9
In the first line we put our C&C “instruction” into a text file called embed.txt. In the next line we use steghide to hide this information in a normal JPG file. Some notes:
- Again note the use of the passcode option (“-p”). This obviously has to match what you have in the python program
- Note that we set the “-z” flag to 9. This means that steghide will use the highest amount of compression possible on the image. We do this on purpose so that twitter doesn’t compress the image for us and potentially corrupt the hidden data. This is very important
The original JPG image had the following MD5 hash:
While the steganography cover image obviously has a different MD5 hash:
We upload the stego image to twitter (https://twitter.com/davevassallo6/status/785367148524953600) and then run our python program… it works!
Now how to stop this… (hint: https://drive.google.com/open?id=0B133ZFctNVtsN2NmSGN6M0t2SVU)
Notes based on some feedback:
- Elasticsearch seem to be pushing the REST client rather than using the native Java client… to future proof your code you may be better off going down this route.
- Why not just use the Re-Index API? Although it’s still considered experimental, this may be a good option if you dont have to munge your data. In other words, if you dont need to modify or perform many operations while re-indexing your data, this API is the way to go. The re-index API does allow you to use “script” but this is rather limited and doesnt perform as well. In my use case presented below, I build the basis for a framework to add/edit more fields while re-indexing. Maybe a better term for this operation would be “context addition” rather than “re-indexing“… in this use case, you’ll have more flexibility and better performance with the below
When it comes to re-indexing Elasticsearch documents, there is no doubt that the Java API is the way to go when performance is important. Using the Java API can have significant performance gains over using other APIs such as the Python Elasticsearch client.
In this article we will present tips on writing the fastest Java Elasticsearch re-indexer possible.
- Use the scroll API
Unlike the python elasticsearch client, the Java API does not abstract the scroll API into a higher level “scan” helper function. This means you should implement the scroll API when querying large amounts of data.
This is done as follows:
In lines 1-10 above we see the exact syntax of building an API scroll, including some queries and filters. Particularly, note the use of .setScroll() and .setSize().
.setSize() simply controls the amount of documents that will be returned per shard for each scroll.
.setScroll() is important for long running operations, and it sets the value for how long a particular scroll view should remain valid. Failing to set this setting appropriately will results in “No Context” errors when updating the scroll ID.
In lines 12 – 25 we see how to “operate” the prepared scroll. As the documentation explains, each scroll gets a number of hits which we process in the “for” loop. Once processed, we need to update the scroll with the new scroll ID (this marks which document the client has arrived to) and loop again until there scroll is finished, which we do in the “while” loop above.
- Leverage multithreading, specifically using Java’s Thread Pools
One area for performance optimization is the use of threads. In the snippet above, a very good option for threads is to process each hit, which should happen in line 15 of the snippet above.
The easiest way to do this is to leverage Java’s ExecutorService as follows:
ExecutorService executor = Executors.newFixedThreadPool(8); // this produces a pool of 8 worker threads.
Now that the thread pool is defined, we can start queueing runnable threads. Within the for loop in line 15 above, we process each hit in a new thread, for example:
Runnable worker = new WorkerThread(hit);
The “workerThread” class is basically a Java Runnable which processes each hit in the manner you require. The below sample shows a WorkerThread example:
The above example doesn’t do anything other than read the “SourceAddress” field, and inserts a hypothetical “bulk_index” field. The new field is added by using our next performance tip: the Bulk Processor. In lines 24-28 the worker thread creates an “update request” which is later fed into a bulk processor to execute this request. In the above snippet, the worker thread calls a method in the parent (called “App.addToBulkRequest”)
- Use the Elasticsearch Java BulkProcessor API
Unlike the python elasticsearch client, the java bulk API has not yet been abstracted into a higher level parallel function like the python “parallel_bulk” function. However, we can still leverage the BulkProcessor java method, as we mentioned above.
We first must prepare a bulk processor:
In the first two lines we simply define the elasticsearch client, and we then define the bulk processor. The bulk processor has three listeners that can be overridden to take actions before an update, and after a successful or unsuccessful update.
Of particular note are the last few line (32-36) where we set the options for the bulk processor. In this example, we set the number of concurrent requests to 3, and we also set several limits (such as 1GB, 5 seconds, or 10,000 documents) which when reached, will trigger a bulk update action.
In our thread example aboves, the threads are feeding update requests into a method called “addToBulkRequest” which has two responsibilities:
- Check if the update request is null. This is an important check. If the update request is null, the entire BulkProcessor will fail.
- Add the update request to the bulk processor we defined above.
With regards to bulk updates in elasticsearch, it’s worth going through the following issue discussion: