Deserialize Google Protobuf binary file

10,298

I am not sure if I am answering your question at all, but I'll give you an example. Once you have your Protocol_pb2.py, you can parse the messages quite easily with the protobuf python API (and possibly struct). I am not an expert in protobuf, but I have at least parsed Mumble messages. Now, I do not know how exactly you are using protobuf and what for, but this example shows how to parse mumble messages (that use protobuf), and hopefully will give you some insight. So:

Import the protoc-created protobuf-file (you create it in the way that I answered to your last question) & struct:

import struct
import Mumble_pb2 as mumble_protobuf

The mumble_protobuf module will contain the different messages defined in your .proto file converted to Python format. You can save the different message types to for example a dict:

PACKET_TYPES = {
    0: mumble_protobuf.Version,
    1: mumble_protobuf.UDPTunnel,
    2: mumble_protobuf.Authenticate,
...

I'm skipping things here but once you receive the binary data of a packet you can parse it. I suppose each application does things differently, but for example mumble sends the protobuf message prefixed with 2 bytes that include the message type, and 4 bytes that tell the packet length. This probably is done differently in your application. Whatever the case, you must somehow know what kind of message you will be parsing (assuming your protocol has multiple message types). But as an example:

# The header format corresponds to the mumble packet that send the prefix + protobuf-msg
HEADER_FORMAT = ">HI" # Big endian, unsigned short + unsigned int
packet_type, packet_length = struct.unpack_from(HEADER_FORMAT, buffer)

If you have multiple messages in a buffer you would need to extract the packet data from your buffer based on the packet length. When you know the message type, you can parse it. This part will just fetch the class corresponding to the message type from the pb2-file, and parse the message based on it.

MessageClass = PACKET_TYPES[packet_type]
message = MessageClass()
message.ParseFromString(packet_data)

Now message will finally contain the parsed protobuf message, and you can use it like you would when constructing one: you can reference to the fields etc: message.UserName or whatever.

Share:
10,298
Mihir Patel
Author by

Mihir Patel

Like building things!

Updated on June 04, 2022

Comments

  • Mihir Patel
    Mihir Patel almost 2 years

    Google Protobuf has confused me more than ever and I am trying to understand how things work.

    1. Please help me understand if I am understanding this correctly. .proto file defines the definition of the message and protoc is a compiler. The data is compiled in a binary file (.pb) . Correct? If not, can you please help me understand. I can't process Google Protobuf docs. It is quite confusing. And haven't had any luck with Stack Overflow or other blogs.

    2. Important, I can't modify my C++ code where the logic is defined. With that being said, I'd like to deserialize filename.pb (binary file) and parse results through Python. Is this possible?

    Thanks for your help in advance!!

  • Mihir Patel
    Mihir Patel over 7 years
    Thanks for the detailed explanation, but this really doesn't solve my problem. I don't have much control on an external server and can't modify code to generate Protocol_pb2.py file. The .pb binary file generated by C++ code needs to be deserialized so I can parse it.
  • Teemu Risikko
    Teemu Risikko over 7 years
    Oh, so I did understand you totally wrong. You do not have access to the .proto-file? Actually, I might not understand what you want at all :S What is your goal? Do you want to somehow generate the pb2.py from .pb binary?
  • Mihir Patel
    Mihir Patel over 7 years
    Well your answer helped me move forward in the right direction. Let's just say protobuf protocols are not intuitive to understand and it was difficult for me to explain. I have proto files but I don't have _pb2.py files yet. I just figured out I need to send my proto files to python and have the output in _pb2.py. Then I can figure out how to parse. Currently, I have a complex app so I need to gather all proto files which gets compiled in a single output.
  • Teemu Risikko
    Teemu Risikko over 7 years
    I thought I explained how to convert *.proto to *_pb2.py with protoc in your last question :P
  • Mihir Patel
    Mihir Patel over 7 years
    Yes it works on my local machine. But not on an external server. I didn't understand the compilation process. All good now. Both answers helped!
  • Tanny
    Tanny almost 7 years
    this much info is readily available, infact the first few links for a google search point to what you have posted. The questioner wants more details.