Protocol buffers python - unicode decode error

13,467

Solution 1

I had exactly same problem with RabbitMQ and Protocol Buffers. The problem is that protocol buffer assumes the input to be of type str, whereas RabbitMQ seems to decode the message as unicode in some cases (if the byte array contains bytes greater than 127). The same may happen with Tornado as well. So far it seems, that the problem can be solved by following piece of code:

body = self.request.body
if type(body) == unicode:
    data = bytearray(body, "utf-8")
    body = bytes(data)
message = whatever.FromString(body)

This code turns the unicode string to python bytes object, which can be happily parsed by protocol buffer messages. Dunno if there is some better way to do this, but at least this seems to work.

Solution 2

I encounter the same problem.

Here is a reference[1].

We should use bytes, in this case.

Thanks.

[1] https://developers.google.com/protocol-buffers/docs/proto#scalar

Share:
13,467
Aditya Singh
Author by

Aditya Singh

Updated on June 21, 2022

Comments

  • Aditya Singh
    Aditya Singh almost 2 years

    I need to receive a protocol buffers message on my python - tornado server and get the stuff out of the binary message.

    postContent = self.request.body
    message = prototemp.ReqMessage()
    message.ParseFromString(postContent)
    

    It works perfectly using a test tool. When i run it in sandbox environment and simulate 1000 requests from my client, it works in certain cases, but in most of the requests, it throws an exception -

      File "server1.py", line 21, in post
        message.ParseFromString(postContent)
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/message.py", line 179, in ParseFromString
        self.MergeFromString(serialized)
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 755, in MergeFromString
        if self._InternalParse(serialized, 0, length) != length:
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
        pos = field_decoder(buffer, new_pos, end, self, field_dict)
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 544, in DecodeField
        if value._InternalParse(buffer, pos, new_pos) != new_pos:
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
        pos = field_decoder(buffer, new_pos, end, self, field_dict)
      File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 410, in DecodeField
        field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8')
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 1: invalid continuation byte
    

    In some other cases it gives these errors -

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 3: invalid start byte
    
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 3: unexpected end of data
    

    What could be the reason ?