UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese

10,032

You need to specify the correct encoding when you open the file. If the JSON is encoded with UTF-8 you can do this:

import json

fname = "test.json" 
with open(fname, encoding='utf-8') as data_file:    
    data = json.load(data_file)

print(data)

output

[{'name': 'Daybreakers', 'detail_url': 'http://www.movieinsider.com/m4120/daybreakers/', 'movie_tt_id': '中文'}]
Share:
10,032
cqcn1991
Author by

cqcn1991

Updated on June 11, 2022

Comments

  • cqcn1991
    cqcn1991 over 1 year

    I'm switching from Python 2 to 3

    In my jupyter notebook the code is

    file = "./data/test.json" 
    with open(file) as data_file:    
        data = json.load(data_file)
    

    It used to be fine with python 2, but now after just switch to python 3, it gives me the error

    UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 123: illegal multibyte sequence
    

    The test.json file is like this:

    [{
        "name": "Daybreakers",
        "detail_url": "http://www.movieinsider.com/m4120/daybreakers/",
        "movie_tt_id": "中文"
      }]
    

    If I delete the chinese, there will be no error.

    So what should I do?

    There are a lot of similar questions in SO, but I didn't find a good solution for my case. If you find an applicable one, please tell me and I'll close this one.

    Thanks a lot!