UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1

python unicode utf-8

354,421

Solution 1

This is to do with the encoding of your terminal not being set to UTF-8. Here is my terminal

$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(｡･ω･｡)ﾉ
>>>

On my terminal the example works with the above, but if I get rid of the LANG setting then it won't work

$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>

Consult the docs for your linux variant to discover how to make this change permanent.

Solution 2

try:

string.decode('utf-8')  # or:
unicode(string, 'utf-8')

edit:

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8') gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89', which is correct.

so your problem must be at some oter place, possibly if you try to do something with it were there is an implicit conversion going on (could be printing, writing to a stream...)

to say more we'll need to see some code.

Solution 3

My +1 to mata's comment at https://stackoverflow.com/a/10561979/1346705 and to the Nick Craig-Wood's demonstration. You have decoded the string correctly. The problem is with the print command as it converts the Unicode string to the console encoding, and the console is not capable to display the string. Try to write the string into a file and look at the result using some decent editor that supports Unicode:

import codecs

s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()

Then you will see (｡･ω･｡)ﾉ.

Solution 4

Try setting the system default encoding as utf-8 at the start of the script, so that all strings are encoded using that.

# coding: utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Solution 5

If you are working on a remote host, look at /etc/ssh/ssh_config on your local PC.

When this file contains a line:

SendEnv LANG LC_*

comment it out with adding # at the head of line. It might help.

With this line, ssh sends language related environment variables of your PC to the remote host. It causes a lot of problems.

View more solutions

354,421

Markum

Hi! I am Mark, a somewhat newcomer to the world of programming. I have really gained an interest in Python, and it is what I write the most of my code in. Currently, I'm working on creating various features for a small but interesting IRC bot, learning all different aspects of Python as I go along.

Updated on June 25, 2020

Comments

Markum almost 4 years
I'm having a few issues trying to encode a string to UTF-8. I've tried numerous things, including using string.encode('utf-8') and unicode(string), but I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

This is my string:
```
(｡･ω･｡)ﾉ
```
I don't see what's going wrong, any idea?

Edit: The problem is that printing the string as it is does not show properly. Also, this error when I try to convert it:
```
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
```
- Markum almost 12 years
  
  It's just a normally inserted string. The same happens when I just try printing it.
- BollMose over 10 years
  
  I meet the same when pip install, and fix it from here: [install some devel][1] [1]: stackoverflow.com/questions/17931726/…
Markum almost 12 years

Both return UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>
Markum almost 12 years

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\‌xef\xbe\x89'
Markum almost 12 years

Printing the original string as is gives (´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë, I want it to encode properly.
Markum almost 12 years

All I'm trying to do is print the original string in its original format, but I get (´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë.
mata almost 12 years

the string is utf8-encoded. if you print it, it just wirites the bytes to the output stream, and if your terminal doesn't interpret it as utf8 you end up with garbage. with decode you convert it to unicode, then you can encode it again to an encoding your terminal understands.
Maximiliano Rios almost 10 years

This is wrong, you're forcing your encoding lambda function to ignore the encoding itself which means you're losing characters.
nobody almost 10 years

@rayryeng Could you explain the reason for your edit? It appears to completely change the meaning of what the OP wrote, from recommending a particular setting to recommending against it.
rayryeng almost 10 years

@AndrewMedico - My apologies. I saw that this post was very similar to another one so I believed that they were the same. I will revert back.
Non over 7 years

Missing locales could also be a reason. To install them run sudo apt-get install language-pack-de or sudo locale-gen de_DE.UTF-8 (for german locales).
Edhowler about 7 years

This solved my problem, where I did not know the original encoding and I did not care about losing some characters.
Andrei Krasutski about 6 years

Using #coding: utf-8 rather than # -*- coding: utf-8 -*- this is easier to remember. Works out of the box with Python PEP 263 -- Defining Python Source Code Encodings.
Maritza Esparza about 6 years

Thanks! These solved the problem that I had installing pip packages with ansible and vagrant
fallingdog over 4 years

why do we need the reload in this case?
Robin Winslow about 4 years

For me, the missing environment variable is LC_ALL, and the simplest value that would fix it is C.UTF-8
hygull almost 4 years

Thanks for the suggestion. Will try out at my end and update it in the answer.
Piyush Goel almost 4 years

This does not work in Python 3 as explained here. For me, Tsutomu's answer below did the trick.