Python Pickling in the cloud (or how to get python to execute code it hasn’t seen yet)

Problem – What are we trying to solve?

Let’s assume you have the beginnings of a simple distributed system:

  • You use redis as your event queue
  • You have a “master” python script that pickles a python class, and sends it to your redis queue
  • You have a “worker” python script that waits for a new event, unpickles the class, and executes a particular method from that class (hereafter this is will the “main” method)

The code would be rather simple, and looks something like this:

The code doesn’t do anything special. We define two sample classes (SampleCommand1 and SampleCommand2), and depending on script arguments we instantiate a class, pickle it, and send it to redis. The worker picks up the object from the redis list, unpickles it, and runs the “main” method. Running the scripts side by side (master on the left, worker on the right), it works pretty well:

However, note that the worker also has the definitions of the SampleCommand1 and SampleCommand2 classes (lines 6-15). What happens if we dont know what the code coming from the master is going to look like – all we mandate is that any class coming though the queue has a valid “main” method. If we try removing the class definitions from the worker like so:

then upon receiving a command the worker complains:

python3 worker.py 
Waiting for command...
Traceback (most recent call last):
  File "worker.py", line 15, in <module>
    main()
  File "worker.py", line 10, in main
    sample_command = pickle.loads(command_string[1])
AttributeError: Can't get attribute 'SampleCommand2' on <module '__main__' from 'worker.py'>

The solution – cloudpickle

Cloudpickle solves exactly this problem, and is used in several well-known projects such as PySpark and Dask (notably, two python libraries dealing with parallel computing and passing arbitrary functions between different computers)

After using pip to install cloudpickle, let’s change the pickle import statement in both our master and worker from

import pickle as pickle

-to-

import cloudpickle as pickle

Now, everything works as expected, without the worker needing to know what code is going to be run beforehand:

Massive kudos to the cloudpickle team for a job well done. This library has yet to fail me – even when using monkey patching!

This technique opens you up to what is essentially remote code execution – be sure to filter ports, secure your infra and so on…

Advertisements