How to solve UnicodeDecodeError in Python 3.6?
Solution 1
It sounds like your locale is broken and have another bytes->Unicode issue. The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys
to make it work).
To fix your locale, try typing locale
from the command line. It should look something like:
LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
locale
depends on LANG
being set properly. Python effectively uses locale
to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII.
You should first attempt to fix your locale. If locale
errors, make sure you've installed the correct language pack for your region.
If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8
. This should be used as a last resort as you'll be masking problems once again.
If Python is still throwing an error after setting PYTHONIOENCODING
then please update your question with the stacktrace. Chances are you've got an implied conversion going on.
Solution 2
I had this issue when using Python inside a Docker container based on Ubuntu 18.04. It appeared to be a locale issue, which was solved by adding the following to the Dockerfile:
ENV LANG C.UTF-8
Solution 3
To everyone using pickle to load a file previously saved in python 2 and getting an UnicodeDecodeError, try setting pickle encoding
parameter:
with open("./data.pkl", "rb") as data_file:
samples = pickle.load(data_file, encoding='latin1')
Solution 4
For a Python-only solution you will have to recreate your sys.stdout
object:
import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())
After this, a normal print("hello world")
should be encoded to UTF-8 automatically.
But you should try to find out why your terminal is set to such a strange encoding (which Python just tries to adopt to). Maybe your operating system is configured wrong somehow.
EDIT: In my tests unsetting the env variable LANG
produced this strange setting for the stdout encoding for me:
LANG= python3
import sys
sys.stdout.encoding
printed 'ANSI_X3.4-1968'
.
So I guess you might want to set your LANG
to something like
en_US.UTF-8
. Your terminal program doesn't seem to do this.
Umair Ayub
** I am currently looking for a Software Engineer opportunity outside Pakistan, preferably Europe ** I am a full time freelancer and I develop Web Scraping applications. You can contact me on iamumairayub AT gmail DOT com if you need my services.
Updated on July 05, 2022Comments
-
Umair Ayub almost 2 years
I am switched from Python 2.7 to Python 3.6.
I have scripts that deal with some non-English content.
I usually run scripts via Cron and also in Terminal.
I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.
# encoding=utf8 import sys reload(sys) sys.setdefaultencoding('utf8')
Now in Python 3.6, it doesnt work. I have print statements like
print("Here %s" % (myvar))
and it throws error. I can solve this issue by replacing it tomyvar.encode("utf-8")
but I don't want to write with each print statement.I did
PYTHONIOENCODING=utf-8
in my terminal and I have still that issue.Is there a cleaner way to solve
UnicodeDecodeError
issue in Python 3.6?is there any way to tell Python3 to print everything in utf-8? just like I did in Python2?