How to include pictures bytes to a JSON with python? (encoding issue)

21,199

Solution 1

JSON data expects to handle Unicode text. Binary image data is not text, so when the json.dumps() function tries to decode the bytestring to unicode using UTF-8 (the default) that decoding fails.

You'll have to wrap your binary data in a text-safe encoding first, such as Base-64:

json.dumps({'picture' : data.encode('base64')})

Of course, this then assumes that the receiver expects your data to be wrapped so.

If your API endpoint has been so badly designed to expect your image bytes to be passed in as text, then the alternative is to pretend that your bytes are really text; if you first decode it as Latin-1 you can map those bytes straight to Unicode codepoints:

json.dumps({'picture' : data.encode('latin-1')})

With the data already a unicode object the json library will then proceed to treat it as text. This does mean that it can replace non-ASCII codepoints with \uhhhh escapes.

Solution 2

The best solution that comes to my mind for this situation, space-wise, is base85 encoding which represents four bytes as five characters. Also you could also map every byte to the corresponding character in U+0000-U+00FF format and then dump it in the json. But still, those could be overkill methods for this and base64, ease-wise, would be the winner.

Share:
21,199
Thom
Author by

Thom

Updated on July 29, 2022

Comments

  • Thom
    Thom almost 2 years

    I would like to include picture bytes into a JSON, but I struggle with a encoding issue:

    import urllib
    import json
    
    data = urllib.urlopen('https://www.python.org/static/community_logos/python-logo-master-v3-TM-flattened.png').read()
    json.dumps({'picture' : data})
    

    UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: invalid start byte

    I don't know how to deal with that issue since I am handling an image, so I am a bit confused about this encoding issue. I am using python 2.7. Does anyone can help me? :)