UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

python python-2.7 utf-8 decode

61,314

Solution 1

Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.

If you have Unicode data, it only makes sense to encode to UTF-8, not decode:

>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you wanted a Unicode value, then using a Unicode literal (u'...') is all you needed to do. No further decoding is necessary.

The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding:

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Solution 2

you can set default encoding utf-8.

import sys  
reload(sys)  
sys.setdefaultencoding('utf-8')

61,314

Serhii Matrunchyk

Updated on July 09, 2022

Comments

Serhii Matrunchyk almost 2 years

I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error:

$ python
Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

I'm Python newbie. What's a problem? Thanks!

Serhii Matrunchyk about 9 years

Actually I use msg = msg + u"@id%s (%s)\n" % (u["id"], u["first_name"].encode('utf8')) and print msg code. And it gives me an error in print clause.
Martijn Pieters about 9 years

@SergiiMatrunchyk: That's not what your question was asking about though. Is your terminal or console correctly configured to handle the characters that you are trying to print?
Martijn Pieters about 9 years

@SergiiMatrunchyk: also, why are you encoding then interpolating into a unicode string? You are putting those values into a u'...' unicode object, you should not be encoding the values you are interpolating.
Alastair McCormack over 8 years

Bad idea. It's a nasty, nasty hack for people who don't understand encoding: anonbadger.wordpress.com/2015/06/16/…
Ranvijay Sachan over 8 years

Thanks Alastair McCormack for your suggestion
Martijn Pieters over 7 years

Do not use this cargo cult solution. sys.setdefaultencoding is removed from the module for a reason, changing the implicit default encoding of Python 2 can break 3rd-party libraries that rely on the normal behaviour.
AKA over 6 years

Too many attempts and finally this answer saved me! Thanks. :)
Kristian K over 2 years

Depending on what process you are running your string through you might want to clean it up first. You can do this by quickly running it through encode and decode with: string = string.encode("ascii","ignore") string = string.decode("ascii") With this the string is "clean" from unwanted chars.