UnicodeDecodeError: 'gbk' codec can't decode byte when read json contains chinese
10,032
You need to specify the correct encoding when you open the file. If the JSON is encoded with UTF-8 you can do this:
import json
fname = "test.json"
with open(fname, encoding='utf-8') as data_file:
data = json.load(data_file)
print(data)
output
[{'name': 'Daybreakers', 'detail_url': 'http://www.movieinsider.com/m4120/daybreakers/', 'movie_tt_id': '中文'}]
Author by
cqcn1991
Updated on June 11, 2022Comments
-
cqcn1991 over 1 year
I'm switching from Python 2 to 3
In my jupyter notebook the code is
file = "./data/test.json" with open(file) as data_file: data = json.load(data_file)
It used to be fine with python 2, but now after just switch to python 3, it gives me the error
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 123: illegal multibyte sequence
The
test.json
file is like this:[{ "name": "Daybreakers", "detail_url": "http://www.movieinsider.com/m4120/daybreakers/", "movie_tt_id": "中文" }]
If I delete the chinese, there will be no error.
So what should I do?
There are a lot of similar questions in SO, but I didn't find a good solution for my case. If you find an applicable one, please tell me and I'll close this one.
Thanks a lot!