UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

61,314

Solution 1

Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.

If you have Unicode data, it only makes sense to encode to UTF-8, not decode:

>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you wanted a Unicode value, then using a Unicode literal (u'...') is all you needed to do. No further decoding is necessary.

The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you'd trigger an implicit decoding:

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Solution 2

you can set default encoding utf-8.

import sys  
reload(sys)  
sys.setdefaultencoding('utf-8')
Share:
61,314

Related videos on Youtube

Serhii Matrunchyk
Author by

Serhii Matrunchyk

Updated on July 09, 2022

Comments

  • Serhii Matrunchyk
    Serhii Matrunchyk almost 2 years

    I'm simply trying to decode \uXXXX\uXXXX\uXXXX-like string. But I get an error:

    $ python
    Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8')
        Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    
        UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
    

    I'm Python newbie. What's a problem? Thanks!

  • Serhii Matrunchyk
    Serhii Matrunchyk about 9 years
    Actually I use msg = msg + u"@id%s (%s)\n" % (u["id"], u["first_name"].encode('utf8')) and print msg code. And it gives me an error in print clause.
  • Martijn Pieters
    Martijn Pieters about 9 years
    @SergiiMatrunchyk: That's not what your question was asking about though. Is your terminal or console correctly configured to handle the characters that you are trying to print?
  • Martijn Pieters
    Martijn Pieters about 9 years
    @SergiiMatrunchyk: also, why are you encoding then interpolating into a unicode string? You are putting those values into a u'...' unicode object, you should not be encoding the values you are interpolating.
  • Alastair McCormack
    Alastair McCormack over 8 years
    Bad idea. It's a nasty, nasty hack for people who don't understand encoding: anonbadger.wordpress.com/2015/06/16/…
  • Ranvijay Sachan
    Ranvijay Sachan over 8 years
    Thanks Alastair McCormack for your suggestion
  • Martijn Pieters
    Martijn Pieters over 7 years
    Do not use this cargo cult solution. sys.setdefaultencoding is removed from the module for a reason, changing the implicit default encoding of Python 2 can break 3rd-party libraries that rely on the normal behaviour.
  • AKA
    AKA over 6 years
    Too many attempts and finally this answer saved me! Thanks. :)
  • Kristian K
    Kristian K over 2 years
    Depending on what process you are running your string through you might want to clean it up first. You can do this by quickly running it through encode and decode with: string = string.encode("ascii","ignore") string = string.decode("ascii") With this the string is "clean" from unwanted chars.