json.dump - UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

python json unicode encoding utf-8

52,839

The exception is caused by the contents of your data dictionary, at least one of the keys or values is not UTF-8 encoded.

You'll have to replace this value; either by substituting a value that is UTF-8 encoded, or by decoding it to a unicode object by decoding just that value with whatever encoding is the correct encoding for that value:

data['142'] = data['142'].decode('latin-1')

to decode that string as a Latin-1-encoded value instead.

52,839

Author by

Belphegor

Updated on July 09, 2022

Comments

Belphegor almost 2 years
I have a dictionary data where I have stored:
- key - ID of an event
- value - the name of this event, where value is a UTF-8 string
Now, I want to write down this map into a json file. I tried with this:
```
with open('events_map.json', 'w') as out_file:
    json.dump(data, out_file, indent = 4)
```
but this gives me the error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

Now, I also tried with:
```
with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
   out_file.write(unicode(json.dumps(data, encoding="utf-8")))
```
but this raises the same error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 0: invalid start byte

I also tried with:
```
with io.open('events_map.json', 'w', encoding='utf-8') as out_file:
    out_file.write(unicode(json.dumps(data, encoding="utf-8", ensure_ascii=False)))
```
but this raises the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xbf in position 3114: ordinal not in range(128)

Any suggestions about how can I solve this problem?

EDIT: I believe this is the line that is causing me the problem:
```
> data['142']
'\xbf/ANCT25'
```
EDIT 2: The data variable is read from a file. So, after reading it from a file:
```
data_file_lines = io.open(file_name, 'r', encoding='utf8').readlines()
```
I then do:
```
with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
        json.dump(data, json_file, ensure_ascii=False)
```
Which gives me the error:

TypeError: must be unicode, not str

Then, I try to do this with the data dictionary:
```
for tuple in sorted_tuples (the `data` variable is initialized by a tuple):
    data[str(tuple[1])] = json.dumps(tuple[0], ensure_ascii=False, encoding='utf8')
```
which is, again, followed by:
```
with io.open('data/events_map.json', 'w', encoding='utf8') as json_file:
    json.dump(data, json_file, ensure_ascii=False)
```
but again, the same error:
```
TypeError: must be unicode, not str
```
I get the same error when I use the simple open function for reading from the file:
```
data_file_lines = open(file_name, "r").readlines()
```
Belphegor over 9 years

I read these values from a file. You were correct about the inverted question mark, so I changed that value with another UTF-8 character (the letter "é"). With your solution data['142'].decode('latin-1') it doesn't raise any errors, but in the final json file I have "142": "\u00e9ANCT25", instead of the expected: "142": "éANCT25". I tried to read the file with codecs.open(file_name, "r", "utf-8"), but here I have: UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 2526468: invalid continuation byte. How do I solve this prob. so the real characters are written in the json?
Martijn Pieters over 9 years

\u00e9 is a valid JSON escape sequence; do you absolutely have to have the Unicode character instead of the JSON \uxxxx escape sequence?
Martijn Pieters over 9 years

@Belphegor: see Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence for how to produce such data.
Belphegor over 9 years

Thanks for the help, but this didn't help me. It still doesn't work. I edited my question where I describe what else I've tried (in "Edit 2"). Any other suggestion?
Belphegor over 9 years

Never mind, I've solved it finally! I got the answer from here: stackoverflow.com/questions/12309269/… (the code for Python 2.x). Anyway, @Martijn Pieters , I wouldn't have done it without you, so I am accepting your answer. But, please add the answer from the link I've provided in your answer, so it would be clearer if someone else bumps into the same problem. Cheers!
Belphegor over 9 years

FYI: I already edited your answer with the final version of my code, but I don't know if it's going to be approved by the moderators. Anyway, tnx for the help!
Blairg23 over 8 years

Thanks, that answer at stackoverflow.com/questions/12309269/… worked for me too!