UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1
Solution 1
This is to do with the encoding of your terminal not being set to UTF-8. Here is my terminal
$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(。・ω・。)ノ
>>>
On my terminal the example works with the above, but if I get rid of the LANG
setting then it won't work
$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>
Consult the docs for your linux variant to discover how to make this change permanent.
Solution 2
try:
string.decode('utf-8') # or:
unicode(string, 'utf-8')
edit:
'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8')
gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89'
, which is correct.
so your problem must be at some oter place, possibly if you try to do something with it were there is an implicit conversion going on (could be printing, writing to a stream...)
to say more we'll need to see some code.
Solution 3
My +1 to mata's comment at https://stackoverflow.com/a/10561979/1346705 and to the Nick Craig-Wood's demonstration. You have decoded the string correctly. The problem is with the print
command as it converts the Unicode string to the console encoding, and the console is not capable to display the string. Try to write the string into a file and look at the result using some decent editor that supports Unicode:
import codecs
s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()
Then you will see (。・ω・。)ノ
.
Solution 4
Try setting the system default encoding as utf-8
at the start of the script, so that all strings are encoded using that.
# coding: utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Solution 5
If you are working on a remote host, look at /etc/ssh/ssh_config
on your local PC.
When this file contains a line:
SendEnv LANG LC_*
comment it out with adding #
at the head of line. It might help.
With this line, ssh
sends language related environment variables of your PC to the remote host. It causes a lot of problems.
Related videos on Youtube
Markum
Hi! I am Mark, a somewhat newcomer to the world of programming. I have really gained an interest in Python, and it is what I write the most of my code in. Currently, I'm working on creating various features for a small but interesting IRC bot, learning all different aspects of Python as I go along.
Updated on June 25, 2020Comments
-
Markum almost 4 years
I'm having a few issues trying to encode a string to UTF-8. I've tried numerous things, including using
string.encode('utf-8')
andunicode(string)
, but I get the error:UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)
This is my string:
(。・ω・。)ノ
I don't see what's going wrong, any idea?
Edit: The problem is that printing the string as it is does not show properly. Also, this error when I try to convert it:
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89' >>> s1 = s.decode('utf-8') >>> print s1 Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
-
Markum almost 12 yearsIt's just a normally inserted string. The same happens when I just try printing it.
-
BollMose over 10 yearsI meet the same when pip install, and fix it from here: [install some devel][1] [1]: stackoverflow.com/questions/17931726/…
-
-
Markum almost 12 yearsBoth return
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>
-
Markum almost 12 years
'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
-
Markum almost 12 yearsPrinting the original string as is gives
(´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë
, I want it to encode properly. -
Markum almost 12 yearsAll I'm trying to do is print the original string in its original format, but I get
(´¢í´¢Ñ¤ë´¢Ñ´¢í)´¥ë
. -
mata almost 12 yearsthe
string
is utf8-encoded. if you print it, it just wirites the bytes to the output stream, and if your terminal doesn't interpret it as utf8 you end up with garbage. withdecode
you convert it to unicode, then you canencode
it again to an encoding your terminal understands. -
Maximiliano Rios almost 10 yearsThis is wrong, you're forcing your encoding lambda function to ignore the encoding itself which means you're losing characters.
-
nobody almost 10 years@rayryeng Could you explain the reason for your edit? It appears to completely change the meaning of what the OP wrote, from recommending a particular setting to recommending against it.
-
rayryeng almost 10 years@AndrewMedico - My apologies. I saw that this post was very similar to another one so I believed that they were the same. I will revert back.
-
Non over 7 yearsMissing locales could also be a reason. To install them run
sudo apt-get install language-pack-de
orsudo locale-gen de_DE.UTF-8
(for german locales). -
Edhowler about 7 yearsThis solved my problem, where I did not know the original encoding and I did not care about losing some characters.
-
Andrei Krasutski about 6 yearsUsing
#coding: utf-8
rather than# -*- coding: utf-8 -*-
this is easier to remember. Works out of the box with Python PEP 263 -- Defining Python Source Code Encodings. -
Maritza Esparza about 6 yearsThanks! These solved the problem that I had installing pip packages with ansible and vagrant
-
fallingdog over 4 yearswhy do we need the reload in this case?
-
Robin Winslow about 4 yearsFor me, the missing environment variable is
LC_ALL
, and the simplest value that would fix it isC.UTF-8
-
hygull almost 4 yearsThanks for the suggestion. Will try out at my end and update it in the answer.
-
Piyush Goel almost 4 yearsThis does not work in Python 3 as explained here. For me, Tsutomu's answer below did the trick.