Python unicode in Mac os X terminal

19,440

Solution 1

>>> print 'абвгд'
абвгд

When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.

>>> u'абвгд'

Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the u prefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.

The straight forward way to create unicode strings in an interactive session would therefore be something like this this:

>>> us = 'абвгд'.decode('my-terminal-encoding')

In files you can also specify the encoding of the file with a special mode line:

# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'

For other ways to set the default input encoding you can look at sys.setdefaultencoding(...) or sys.stdin.encoding.

Solution 2

As of Python 2.6, you can use the environment variable PYTHONIOENCODING to tell Python that your terminal is UTF-8 capable. The easiest way to make this permanent is by adding the following line to your ~/.bash_profile:

export PYTHONIOENCODING=utf-8

Terminal.app showing unicode output from Python

Solution 3

In addition to ensuring your OS X terminal is set to UTF-8, you may wish to set your python sys default encoding to UTF-8 or better. Create a file in /Library/Python/2.5/site-packages called sitecustomize.py. In this file put:

import sys
sys.setdefaultencoding('utf-8')

The setdefaultencoding method is available only by the site module, and is removed from the sys namespace once startup has completed. As such, you'll need to start a new python interpreter for the change to take effect. You can verify the current default coding at any time after startup with sys.getdefaultencoding().

If the characters aren't already unicode and you need to convert them, use the decode method on a string in order to decode the text from some other charset into unicode... best to specify which charset:

s = 'абвгд'.decode('some_cyrillic_charset') # makes the string unicode
print s.encode('utf-8') # transform the unicode into utf-8, then print it

Solution 4

Also, make sure the terminal encoding is set to Unicode/UTF-8 (and not ascii, which seems to be your setting):

http://www.rift.dk/news.php?item.7.6

Share:
19,440
disc0dancer
Author by

disc0dancer

I do python, javascript and web development.

Updated on July 26, 2022

Comments

  • disc0dancer
    disc0dancer almost 2 years

    Can someone explain to me this odd thing:

    When in python shell I type the following Cyrillic string:

    >>> print 'абвгд'
    абвгд
    

    but when I type:

    >>> print u'абвгд'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)
    

    Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?

  • disc0dancer
    disc0dancer almost 15 years
    I figured that one, but what bugs me is that my terminal DOES show unicode properly if it's typed as a normal string - e.g. 'уникоде', but throws an error if I try to print the same string as u'уникоде'
  • disc0dancer
    disc0dancer almost 15 years
    This solved my problems, although the repr() explanation is not correct. I made a mistake in my question (sorry) which I now fixed - I WAS printing the u'абвгд' string actually, so it's not a repr() error. In fact - I do not get the error if I omit the print statement - I just get u'\xd0\xb0\xd0\xb1\xd0\xb2\xd0\xb3\xd0\xb4' My guess would be that the default encoding - mac-roman is somehow able to represent cyrilic chars (which, on the other had doesn't make sense ...), but not cyrilic in unicode. I really dont get this :)
  • Dima Tisnek
    Dima Tisnek about 10 years
    Nice example, especially considering that OSX python build come with meager sys.maxunicode == 0xffff
  • Martijn Pieters
    Martijn Pieters almost 10 years
    Don't change the system default encoding; fix your Unicode values instead. Changing the default encoding can break libraries that rely on the, you know, default behaviour. There is a reason you have to force a module reload before you can do this.
  • Pouya
    Pouya almost 10 years
    I had problem with sympy pretty print and your trick solved the problem. THank you.
  • xApple
    xApple about 8 years
    python -c 'print(u"\U0001F46F")'
  • GeekHades
    GeekHades over 7 years
    it works for me, And just do one, resolve it forever!