Python unicode in Mac os X terminal
Solution 1
>>> print 'абвгд'
абвгд
When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.
>>> u'абвгд'
Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u
prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.
The straight forward way to create unicode strings in an interactive session would therefore be something like this this:
>>> us = 'абвгд'.decode('my-terminal-encoding')
In files you can also specify the encoding of the file with a special mode line:
# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'
For other ways to set the default input encoding you can look at sys.setdefaultencoding(...)
or sys.stdin.encoding
.
Solution 2
As of Python 2.6, you can use the environment variable PYTHONIOENCODING
to tell Python that your terminal is UTF-8 capable. The easiest way to make this permanent is by adding the following line to your ~/.bash_profile
:
export PYTHONIOENCODING=utf-8
Solution 3
In addition to ensuring your OS X terminal is set to UTF-8, you may wish to set your python sys default encoding to UTF-8 or better. Create a file in /Library/Python/2.5/site-packages
called sitecustomize.py
. In this file put:
import sys
sys.setdefaultencoding('utf-8')
The setdefaultencoding
method is available only by the site module, and is removed from the sys namespace once startup has completed. As such, you'll need to start a new python interpreter for the change to take effect. You can verify the current default coding at any time after startup with sys.getdefaultencoding()
.
If the characters aren't already unicode and you need to convert them, use the decode
method on a string in order to decode the text from some other charset into unicode... best to specify which charset:
s = 'абвгд'.decode('some_cyrillic_charset') # makes the string unicode
print s.encode('utf-8') # transform the unicode into utf-8, then print it
Solution 4
Also, make sure the terminal encoding is set to Unicode/UTF-8 (and not ascii, which seems to be your setting):
Comments
-
disc0dancer almost 2 years
Can someone explain to me this odd thing:
When in python shell I type the following Cyrillic string:
>>> print 'абвгд' абвгд
but when I type:
>>> print u'абвгд' Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?
-
disc0dancer almost 15 yearsI figured that one, but what bugs me is that my terminal DOES show unicode properly if it's typed as a normal string - e.g. 'уникоде', but throws an error if I try to print the same string as u'уникоде'
-
disc0dancer almost 15 yearsThis solved my problems, although the repr() explanation is not correct. I made a mistake in my question (sorry) which I now fixed - I WAS printing the u'абвгд' string actually, so it's not a repr() error. In fact - I do not get the error if I omit the print statement - I just get u'\xd0\xb0\xd0\xb1\xd0\xb2\xd0\xb3\xd0\xb4' My guess would be that the default encoding - mac-roman is somehow able to represent cyrilic chars (which, on the other had doesn't make sense ...), but not cyrilic in unicode. I really dont get this :)
-
Dima Tisnek about 10 yearsNice example, especially considering that OSX python build come with meager
sys.maxunicode == 0xffff
-
Martijn Pieters almost 10 yearsDon't change the system default encoding; fix your Unicode values instead. Changing the default encoding can break libraries that rely on the, you know, default behaviour. There is a reason you have to force a module reload before you can do this.
-
Pouya almost 10 yearsI had problem with sympy pretty print and your trick solved the problem. THank you.
-
xApple about 8 yearspython -c 'print(u"\U0001F46F")'
-
GeekHades over 7 yearsit works for me, And just do one, resolve it forever!