Why doesn't Python recognize my utf-8 encoded source file?
The encoding your terminal is using doesn't support that character:
>>> '\xdf'.encode('cp866')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>
Python is handling it just fine, it's your output encoding that cannot handle it.
You can try using chcp 65001
in the Windows console to switch your codepage; chcp
is a windows command line command to change code pages.
Mine, on OS X (using UTF-8) can handle it just fine:
>>> print('\xdf')
ß
Londerson Araújo
I am a curios person with both academic and industry experience in software engineering. I started my career as a full-stack web developer, then moved on to do my PhD. During my PhD years I have designed and implemented algorithms for massively parallel supercomputers (up to 300,000 cores and terabytes of RAM). Currently I do back end development in an advertising company (RTB). My research profile.
Updated on July 09, 2022Comments
-
Londerson Araújo almost 2 years
Here is a little tmp.py with a non ASCII character:
if __name__ == "__main__": s = 'ß' print(s)
Running it I get the following error:
Traceback (most recent call last): File ".\tmp.py", line 3, in <module> print(s) File "C:\Python32\lib\encodings\cp866.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>
The Python docs says:
By default, Python source files are treated as encoded in UTF-8...
My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious). I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ß symbol).
If I put:
# -*- encoding: utf-8 -*-
as the first string in tmp.py it does not change anything—the error persists.
Could someone help me to figure out what am I doing wrong?
-
Esailija over 11 yearsHe should be fine in windows if he does
chcp 65001
before he runs the program, assuming python detects that -
Martijn Pieters over 11 years@Esailija: I've had feedback that that doesn't always work. I think fonts need switching too.
-
Esailija over 11 yearsFor a
ß
, probably not. But maybe for more exotic characters the default windows cmd prompt font probably won't do :P -
Londerson Araújo over 11 yearsYou're right: it is the terminal thing. If I do
with open('tmp.txt', 'w', encoding='utf-8') as f: f.write(s)
it works fine. Can you elaborate on "try using chcp 65001" — that does not say anything to me. -
Esailija over 11 years@mezhaka you can fix the terminal too, I just installed python 3 and tested that
chcp 65001
works. Runchcp 65001
in your terminal before running the python file. -
martineau over 11 years@mezhaka:
chcp 65001
is a Windows command to change the code page (encoding) being used in the command-line window. If you issue it before starting Python 3 it will carry over to the Python console. Doing this with Python 2.7.3 will result in an error. -
Martijn Pieters over 11 years@mezhaka:E] I expanded the sentence a little, it was indeed not very clear.
-
Londerson Araújo over 11 years@Esailija I tried to run chcp 65001 before running the script. It now gives me no error, but still the non ASCII characters are either not printed or the wrong symbols are printed. But I am fine with that, I'll just write what I need directly to a file. (Btw. if I redirect the the output via > I get the that encoding error again.)
-
Martijn Pieters over 11 years@mezhaka: yes, redirecting to a file means there is no encoding set for printing (writing to
sys.stdout
). Encode manually in that case. And your terminal font doesn't support the characters you are trying to print, so they are not displayed correctly. -
Londerson Araújo over 11 years@MartijnPieters Indeed, I've changed the font to Lucida Console and I see the ß!
-
jfs about 8 yearsthe correct solution is to leave
chcp
alone and use Unicode API on Windows -
Martijn Pieters about 8 years@J.F.Sebastian: I agree; the number of questions about Windows console printing I've duped to that post is rather long now.