how to fix json.dumps error ''utf8' codec can't decode byte 0xe0 in position 2'?

10,801

Any byte strings (in Python 2 any string not a unicode string is a byte string) is decoded to Unicode first when creating the JSON output. The json.dumps() method by default uses UTF-8 for that; your input data is not using UTF-8 however.

Tell json.dumps() what encoding to use instead, or decode your strings to unicode yourself. Here, you appear to be using Latin-1 strings, so use that:

json.dumps(a, encoding='latin1')

Demo:

>>> import json
>>> a = {'code': 'exam', 'list': [{'note': '2', 'right': '2', 'question': 'Tr\xe0n V?n H\xf9ng', 'answers': ['etreetetetetret', 'reteretet', 'tedtetetet', 'etetetet']}], 'id': 1, 'level': 1}
>>> json.dumps(a, encoding='latin1')
'{"code": "exam", "list": [{"note": "2", "right": "2", "question": "Tr\\u00e0n V?n H\\u00f9ng", "answers": ["etreetetetetret", "reteretet", "tedtetetet", "etetetet"]}], "id": 1, "level": 1}'
Share:
10,801
langiac
Author by

langiac

Updated on June 05, 2022

Comments

  • langiac
    langiac about 2 years

    I have

    import json 
    a = {'code': 'exam', 'list': [{'note': '2', 'right': '2', 'question': 'Tr\xe0n V?n H\xf9ng', 'answers': ['etreetetetetret', 'reteretet', 'tedtetetet', 'etetetet']}], 'id': 1, 'level': 1}
    
    json.dumps(a)
    

    ===> error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 2: invalid

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
        sort_keys=sort_keys, **kw).encode(obj)
      File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
        chunks = self.iterencode(o, _one_shot=True)
      File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
        return _iterencode(o, 0)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 2: invalid continuation byte