UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
61,314
Solution 1
Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.
If you have Unicode data, it only makes sense to encode to UTF-8, not decode:
>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'
If you wanted a Unicode value, then using a Unicode literal (u'...'
) is all you needed to do. No further decoding is necessary.
The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding:
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
Solution 2
you can set default encoding utf-8.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Related videos on Youtube
Author by
Serhii Matrunchyk
Updated on July 09, 2022Comments
-
Serhii Matrunchyk almost 2 years
I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error:
$ python Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
I'm Python newbie. What's a problem? Thanks!
-
Serhii Matrunchyk about 9 yearsActually I use
msg = msg + u"@id%s (%s)\n" % (u["id"], u["first_name"].encode('utf8'))
andprint msg
code. And it gives me an error inprint
clause. -
Martijn Pieters about 9 years@SergiiMatrunchyk: That's not what your question was asking about though. Is your terminal or console correctly configured to handle the characters that you are trying to print?
-
Martijn Pieters about 9 years@SergiiMatrunchyk: also, why are you encoding then interpolating into a unicode string? You are putting those values into a
u'...'
unicode object, you should not be encoding the values you are interpolating. -
Alastair McCormack over 8 yearsBad idea. It's a nasty, nasty hack for people who don't understand encoding: anonbadger.wordpress.com/2015/06/16/…
-
Ranvijay Sachan over 8 yearsThanks Alastair McCormack for your suggestion
-
Martijn Pieters over 7 yearsDo not use this cargo cult solution.
sys.setdefaultencoding
is removed from the module for a reason, changing the implicit default encoding of Python 2 can break 3rd-party libraries that rely on the normal behaviour. -
AKA over 6 yearsToo many attempts and finally this answer saved me! Thanks. :)
-
Kristian K over 2 yearsDepending on what process you are running your string through you might want to clean it up first. You can do this by quickly running it through encode and decode with:
string = string.encode("ascii","ignore") string = string.decode("ascii")
With this the string is "clean" from unwanted chars.