Why can't I display a unicode character in the Python Interpreter on Mac OS X Terminal.app?

10,034

unicode('\xc2\xb7') means to decode the byte string in question with the default codec, which is ascii -- and that of course fails (trying to set a different default encoding has never worked well, and in particular doesn't apply to "pasted literals" -- that would require a different setting anyway). You could use instead u'\xc2\xb7', and see:

>>> print(u'\xc2\xb7')
·

since those are two unicode characters of course. While:

>>> print(u'\uc2b7')
슷

gives you a single unicode character (of some oriental persuasion -- sorry, I'm ignorant about these things). BTW, neither of these is the "middle dot" you were looking for. Maybe you mean

>>> print('\xc2\xb7'.decode('utf8'))
·

which is the middle dot. BTW, for me (python 2.6.4 from python.org on a Mac Terminal.app):

>>> print('슷')
슷

which kind of surprised me (I expected an error...!-).

Share:
10,034
Bjorn
Author by

Bjorn

I like to code.

Updated on July 20, 2022

Comments

  • Bjorn
    Bjorn almost 2 years

    If I try to paste a unicode character such as the middle dot:

    ·

    in my python interpreter it does nothing. I'm using Terminal.app on Mac OS X and when I'm simply in in bash I have no trouble:

    :~$ ·
    

    But in the interpreter:

    :~$ python
    Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    

    ^^ I get nothing, it just ignores that I just pasted the character. If I use the escape \xNN\xNN representation of the middle dot '\xc2\xb7', and try to convert to unicode, trying to show the dot causes the interpreter to throw an error:

    >>> unicode('\xc2\xb7')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
    

    I have setup 'utf-8' as my default encoding in sitecustomize.py so:

    >>> sys.getdefaultencoding()
    'utf-8'
    

    What gives? It's not the Terminal. It's not Python, what am I doing wrong?!

    This question is not related to this question, as that indivdiual is able to paste unicode into his Terminal.

  • Bjorn
    Bjorn about 14 years
    Wow, if there's someone I want to answer a question about Python when I have one, it's Alex Martelli! Thank you! I own all of your Python books.
  • Bjorn
    Bjorn about 14 years
    Hrm, all of that worked for me and cleared up some confusion I had on unicode vs utf-8, but I am still not able to paste a unicode character in the python interpreter on Mac Terminal.app. Neither can my co-worker when he uses the default apple shell, but he can with the port version of python I guess it is an application or clipboard issue.
  • Chris Johnsen
    Chris Johnsen about 14 years
    u'\xc2\xb7' is not the same thing as '\xc2\xb7'.decode('utf8')/unicode('\xc2\xb7','UTF-8'). The former is a Unicode string of two code points (U+00C2 (LATIN CAPITAL LETTER A WITH CIRCUMFLEX) and U+00B7 (MIDDLE DOT)), the latter evaluates to Unicode string with a single code point (U+00B7 (MIDDLE DOT); its UTF-8 encoding requires two bytes). u'\uc2b7' is (as illustrated) something completely different: U+C2B7 (HANGUL SYLLABLE SEUS).
  • polarise
    polarise about 10 years
    I think those are Korean characters (someone correct me if I'm wrong). They sound like 'sis'.