json.dump - UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

52,839

The exception is caused by the contents of your data dictionary, at least one of the keys or values is not UTF-8 encoded.

You'll have to replace this value; either by substituting a value that is UTF-8 encoded, or by decoding it to a unicode object by decoding just that value with whatever encoding is the correct encoding for that value:

data['142'] = data['142'].decode('latin-1')

to decode that string as a Latin-1-encoded value instead.

Share:
52,839
Belphegor
Author by

Belphegor

Updated on July 09, 2022

Comments

  • Belphegor
    Belphegor almost 2 years

    I have a dictionary data where I have stored:

    • key - ID of an event

    • value - the name of this event, where value is a UTF-8 string

    Now, I want to write down this map into a json file. I tried with this:

    with open('events_map.json', 'w') as out_file:
        json.dump(data, out_file, indent = 4)
    

    but this gives me the error:

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

    Now, I also tried with:

    with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
       out_file.write(unicode(json.dumps(data, encoding="utf-8")))
    

    but this raises the same error:

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

    I also tried with:

    with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
        out_file.write(unicode(json.dumps(data, encoding="utf-8", ensure_ascii=False)))
    

    but this raises the error:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xbf in position 3114: ordinal not in range(128)

    Any suggestions about how can I solve this problem?

    EDIT: I believe this is the line that is causing me the problem:

    > data['142']
    '\xbf/ANCT25'
    

    EDIT 2: The data variable is read from a file. So, after reading it from a file:

    data_file_lines = io.open(file_name, 'r', encoding='utf8').readlines()
    

    I then do:

    with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
            json.dump(data, json_file, ensure_ascii=False)
    

    Which gives me the error:

    TypeError: must be unicode, not str

    Then, I try to do this with the data dictionary:

    for tuple in sorted_tuples (the `data` variable is initialized by a tuple):
        data[str(tuple[1])] = json.dumps(tuple[0], ensure_ascii=False, encoding='utf8')
    

    which is, again, followed by:

    with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
        json.dump(data, json_file, ensure_ascii=False)
    

    but again, the same error:

    TypeError: must be unicode, not str
    

    I get the same error when I use the simple open function for reading from the file:

    data_file_lines = open(file_name, "r").readlines()
    
  • Belphegor
    Belphegor over 9 years
    I read these values from a file. You were correct about the inverted question mark, so I changed that value with another UTF-8 character (the letter "é"). With your solution data['142'].decode('latin-1') it doesn't raise any errors, but in the final json file I have "142": "\u00e9ANCT25", instead of the expected: "142": "éANCT25". I tried to read the file with codecs.open(file_name, "r", "utf-8"), but here I have: UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 2526468: invalid continuation byte. How do I solve this prob. so the real characters are written in the json?
  • Martijn Pieters
    Martijn Pieters over 9 years
    \u00e9 is a valid JSON escape sequence; do you absolutely have to have the Unicode character instead of the JSON \uxxxx escape sequence?
  • Martijn Pieters
    Martijn Pieters over 9 years
    @Belphegor: see Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence for how to produce such data.
  • Belphegor
    Belphegor over 9 years
    Thanks for the help, but this didn't help me. It still doesn't work. I edited my question where I describe what else I've tried (in "Edit 2"). Any other suggestion?
  • Belphegor
    Belphegor over 9 years
    Never mind, I've solved it finally! I got the answer from here: stackoverflow.com/questions/12309269/… (the code for Python 2.x). Anyway, @Martijn Pieters , I wouldn't have done it without you, so I am accepting your answer. But, please add the answer from the link I've provided in your answer, so it would be clearer if someone else bumps into the same problem. Cheers!
  • Belphegor
    Belphegor over 9 years
    FYI: I already edited your answer with the final version of my code, but I don't know if it's going to be approved by the moderators. Anyway, tnx for the help!
  • Blairg23
    Blairg23 over 8 years
    Thanks, that answer at stackoverflow.com/questions/12309269/… worked for me too!