Pentesting gRPC / Protobuf : Decoding First steps

Protocol Buffers (a.k.a ProtoBuf) and other binary serialization representations are gaining popularity, especially in inter-microservice communication. Unlike JSON or HTTP, ProtoBufs are not human readable (hence the “binary” part of binary serialization) , but that translates into an advantage of  less overhead, leading to performance gains, and the ability to code against a fixed schema which introduces a kind of “type checker” when programming your APIs. However, from a penetration test point of view, the lack of human readability means that it’s harder to use our existing tools such as proxies to manually or automatically fuzz for injection points within the data stream

One popular implementation of ProtoBufs is gRPC and during the course of a pentest you might end up capturing binary traffic that is generated by such an RPC framework. A packet capture of an RPC exchange could look similar to the following:

Screenshot from 2018-10-17 16-37-22

The above shows a pcap simple UDP packet, and the data portion shows only a few printable characters which dont give much information. It’s not a given that protobuf has to be transported using UDP either – this could have just as well been over TCP or HTTP (google maps does this for example).

So assuming you have access to the data as shown in the pcap above and you’ve figured out that it’s ProtoBuf data, but you don’t have access to the .proto file that defines the schema/format of the binary data – what now? Ideally you’d be able to decode the data in such a way as to figure out the schema of the data which increases the chances of a successful injection attack.

It turns out this is possible using nothing other than a single line bash command Assuming an ubuntu/debian system.

  • Install some pre-requisites:
sudo apt install protobuf-compiler xxd
  • Copy the data from the pcap as a hex stream (in wireshark, right click on the highlighted data portion and click “… as hex stream”.
  • Use the copied data in the following command:
echo 0d1c0000001203596f751a024d65202b2a0a0a066162633132331200 | xxd -r -p | protoc --decode_raw

In the above command, we pass the hex stream we just copied, through XXD to reverse the hex into binary and follow that by piping the binary result into the protobuf compiler (protoc) to decode the raw data stream. The result is as follows:

1: 0x0000001c
2: "You"
3: "Me"
4: 43
5 {
  1: "abc123"
  2: ""
}

This now give you a much better idea of what is being sent over the wire. Obviously there are no “field names” to provide more context (unfortunately for that you’d need the schema .proto files), but it does give you more context to fuzz the data stream better.

Let’s double check the result against the schema .proto files that generated the above traffic. Normally you wouldn’t have access to this in a blackbox pentest unless you somehow got access to the source code or reverse engineered / hooked into the binary. For the sake of discussion let’s say you have access to the .proto files, then here’s the part that’s of interest to us:

Screenshot from 2018-10-17 16-50-16.png

Note how each field is equal to some integer identifier (eg =1, =2, =3 and so on). These match the ones in our decoded result, so taking these two together we know that our decoded result actually looks something like this:

length: 0x0000001c
to: "You"
from: "Me"
transid: 43
query {
   1: "abc123"
   2: ""
}

Note how “5” points to another schema file called “query.QueryRequest” so we need to lookup the .proto file for that as well:

Screenshot from 2018-10-17 16-54-11.png

This leads to the final result:

length: 0x0000001c
to: "You"
from: "Me"
transid: 43
query {
   dataSet: "abc123"
   fields: ""
}

Material provided by: 

https://github.com/128technology/protobuf_dissector/tree/master/test

Advertisements