Why doesn't Python recognize my utf-8 encoded source file?

20,433

The encoding your terminal is using doesn't support that character:

>>> '\xdf'.encode('cp866')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

Python is handling it just fine, it's your output encoding that cannot handle it.

You can try using chcp 65001 in the Windows console to switch your codepage; chcp is a windows command line command to change code pages.

Mine, on OS X (using UTF-8) can handle it just fine:

>>> print('\xdf')
ß
Share:
20,433
Londerson Araújo
Author by

Londerson Araújo

I am a curios person with both academic and industry experience in software engineering. I started my career as a full-stack web developer, then moved on to do my PhD. During my PhD years I have designed and implemented algorithms for massively parallel supercomputers (up to 300,000 cores and terabytes of RAM). Currently I do back end development in an advertising company (RTB). My research profile.

Updated on July 09, 2022

Comments

  • Londerson Araújo
    Londerson Araújo almost 2 years

    Here is a little tmp.py with a non ASCII character:

    if __name__ == "__main__":
        s = 'ß'
        print(s)
    

    Running it I get the following error:

    Traceback (most recent call last):
      File ".\tmp.py", line 3, in <module>
        print(s)
      File "C:\Python32\lib\encodings\cp866.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>
    

    The Python docs says:

    By default, Python source files are treated as encoded in UTF-8...

    My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious). I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ß symbol).

    If I put:

    # -*- encoding: utf-8 -*-
    

    as the first string in tmp.py it does not change anything—the error persists.

    Could someone help me to figure out what am I doing wrong?

  • Esailija
    Esailija over 11 years
    He should be fine in windows if he does chcp 65001 before he runs the program, assuming python detects that
  • Martijn Pieters
    Martijn Pieters over 11 years
    @Esailija: I've had feedback that that doesn't always work. I think fonts need switching too.
  • Esailija
    Esailija over 11 years
    For a ß, probably not. But maybe for more exotic characters the default windows cmd prompt font probably won't do :P
  • Londerson Araújo
    Londerson Araújo over 11 years
    You're right: it is the terminal thing. If I do with open('tmp.txt', 'w', encoding='utf-8') as f: f.write(s) it works fine. Can you elaborate on "try using chcp 65001" — that does not say anything to me.
  • Esailija
    Esailija over 11 years
    @mezhaka you can fix the terminal too, I just installed python 3 and tested that chcp 65001 works. Run chcp 65001 in your terminal before running the python file.
  • martineau
    martineau over 11 years
    @mezhaka: chcp 65001 is a Windows command to change the code page (encoding) being used in the command-line window. If you issue it before starting Python 3 it will carry over to the Python console. Doing this with Python 2.7.3 will result in an error.
  • Martijn Pieters
    Martijn Pieters over 11 years
    @mezhaka:E] I expanded the sentence a little, it was indeed not very clear.
  • Londerson Araújo
    Londerson Araújo over 11 years
    @Esailija I tried to run chcp 65001 before running the script. It now gives me no error, but still the non ASCII characters are either not printed or the wrong symbols are printed. But I am fine with that, I'll just write what I need directly to a file. (Btw. if I redirect the the output via > I get the that encoding error again.)
  • Martijn Pieters
    Martijn Pieters over 11 years
    @mezhaka: yes, redirecting to a file means there is no encoding set for printing (writing to sys.stdout). Encode manually in that case. And your terminal font doesn't support the characters you are trying to print, so they are not displayed correctly.
  • Londerson Araújo
    Londerson Araújo over 11 years
    @MartijnPieters Indeed, I've changed the font to Lucida Console and I see the ß!
  • jfs
    jfs about 8 years
  • Martijn Pieters
    Martijn Pieters about 8 years
    @J.F.Sebastian: I agree; the number of questions about Windows console printing I've duped to that post is rather long now.