UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

354,421

Solution 1

This is to do with the encoding of your terminal not being set to UTF-8. Here is my terminal

$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(。・ω・。)ノ
>>> 

On my terminal the example works with the above, but if I get rid of the LANG setting then it won't work

$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>> 

Consult the docs for your linux variant to discover how to make this change permanent.

Solution 2

try:

string.decode('utf-8')  # or:
unicode(string, 'utf-8')

edit:

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8') gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89', which is correct.

so your problem must be at some oter place, possibly if you try to do something with it were there is an implicit conversion going on (could be printing, writing to a stream...)

to say more we'll need to see some code.

Solution 3

My +1 to mata's comment at https://stackoverflow.com/a/10561979/1346705 and to the Nick Craig-Wood's demonstration. You have decoded the string correctly. The problem is with the print command as it converts the Unicode string to the console encoding, and the console is not capable to display the string. Try to write the string into a file and look at the result using some decent editor that supports Unicode:

import codecs

s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()

Then you will see (。・ω・。)ノ.

Solution 4

Try setting the system default encoding as utf-8 at the start of the script, so that all strings are encoded using that.

# coding: utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Solution 5

If you are working on a remote host, look at /etc/ssh/ssh_config on your local PC.

When this file contains a line:

SendEnv LANG LC_*

comment it out with adding # at the head of line. It might help.

With this line, ssh sends language related environment variables of your PC to the remote host. It causes a lot of problems.

Share:
354,421

Related videos on Youtube

Markum
Author by

Markum

Hi! I am Mark, a somewhat newcomer to the world of programming. I have really gained an interest in Python, and it is what I write the most of my code in. Currently, I'm working on creating various features for a small but interesting IRC bot, learning all different aspects of Python as I go along.

Updated on June 25, 2020

Comments

  • Markum
    Markum almost 4 years

    I'm having a few issues trying to encode a string to UTF-8. I've tried numerous things, including using string.encode('utf-8') and unicode(string), but I get the error:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

    This is my string:

    (。・ω・。)ノ
    

    I don't see what's going wrong, any idea?

    Edit: The problem is that printing the string as it is does not show properly. Also, this error when I try to convert it:

    Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
    [GCC 4.5.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
    >>> s1 = s.decode('utf-8')
    >>> print s1
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
    
    • Markum
      Markum almost 12 years
      It's just a normally inserted string. The same happens when I just try printing it.
    • BollMose
      BollMose over 10 years
      I meet the same when pip install, and fix it from here: [install some devel][1] [1]: stackoverflow.com/questions/17931726/…
  • Markum
    Markum almost 12 years
    Both return UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>
  • Markum
    Markum almost 12 years
    '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\‌​xef\xbe\x89'
  • Markum
    Markum almost 12 years
    Printing the original string as is gives (´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë, I want it to encode properly.
  • Markum
    Markum almost 12 years
    All I'm trying to do is print the original string in its original format, but I get (´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë.
  • mata
    mata almost 12 years
    the string is utf8-encoded. if you print it, it just wirites the bytes to the output stream, and if your terminal doesn't interpret it as utf8 you end up with garbage. with decode you convert it to unicode, then you can encode it again to an encoding your terminal understands.
  • Maximiliano Rios
    Maximiliano Rios almost 10 years
    This is wrong, you're forcing your encoding lambda function to ignore the encoding itself which means you're losing characters.
  • nobody
    nobody almost 10 years
    @rayryeng Could you explain the reason for your edit? It appears to completely change the meaning of what the OP wrote, from recommending a particular setting to recommending against it.
  • rayryeng
    rayryeng almost 10 years
    @AndrewMedico - My apologies. I saw that this post was very similar to another one so I believed that they were the same. I will revert back.
  • Non
    Non over 7 years
    Missing locales could also be a reason. To install them run sudo apt-get install language-pack-de or sudo locale-gen de_DE.UTF-8 (for german locales).
  • Edhowler
    Edhowler about 7 years
    This solved my problem, where I did not know the original encoding and I did not care about losing some characters.
  • Andrei Krasutski
    Andrei Krasutski about 6 years
    Using #coding: utf-8 rather than # -*- coding: utf-8 -*- this is easier to remember. Works out of the box with Python PEP 263 -- Defining Python Source Code Encodings.
  • Maritza Esparza
    Maritza Esparza about 6 years
    Thanks! These solved the problem that I had installing pip packages with ansible and vagrant
  • fallingdog
    fallingdog over 4 years
    why do we need the reload in this case?
  • Robin Winslow
    Robin Winslow about 4 years
    For me, the missing environment variable is LC_ALL, and the simplest value that would fix it is C.UTF-8
  • hygull
    hygull almost 4 years
    Thanks for the suggestion. Will try out at my end and update it in the answer.
  • Piyush Goel
    Piyush Goel almost 4 years
    This does not work in Python 3 as explained here. For me, Tsutomu's answer below did the trick.