How to print utf-8 to console with Python 3.4 (Windows 8)?
Solution 1
What I'm trying to do is print utf-8 card symbols (♠,♥,♦,♣) from a python module to a windows console
UTF-8 is a byte encoding of Unicode characters. ♠♥♦♣ are Unicode characters which can be reproduced in a variety of encodings and UTF-8 is one of those encodings—as a UTF, UTF-8 can reproduce any Unicode character. But there is nothing specifically “UTF-8” about those characters.
Other encodings that can reproduce the characters ♠♥♦♣ are Windows code page 850 and 437, which your console is likely to be using under a Western European install of Windows. You can print ♠ in these encodings but you are not using UTF-8 to do so, and you won't be able to use other Unicode characters that are available in UTF-8 but outside the scope of these code pages.
print(u'♠')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660'
In Python 3 this is the same as the print('♠')
test you did above, so there is something different about how you are invoking the script containing this print
, compared to your py -3.4
. What does sys.stdout.encoding
give you from the script?
To get print
working correctly you would have to make sure Python picks up the right encoding. If it is not doing that adequately from the terminal settings you would indeed have to set PYTHONIOENCODING
to cp437
.
>>> text = '♠'
>>> print(text.encode('utf-8'))
b'\xe2\x99\xa0'
print
can only print Unicode strings. For other types including the bytes
string that results from the encode()
method, it gets the literal representation (repr
) of the object. b'\xe2\x99\xa0'
is how you would write a Python 3 bytes literal containing a UTF-8 encoded ♠.
If what you want to do is bypass print
's implicit encoding to PYTHONIOENCODING and substitute your own, you can do that explicitly:
>>> import sys
>>> sys.stdout.buffer.write('♠'.encode('cp437'))
This will of course generate wrong output for any consoles not running code page 437 (eg non-Western-European installs). Generally, for apps using the C stdio, like Python does, getting non-ASCII characters to the Windows console is just too unreliable to bother with.
Solution 2
Since Python 3.7.x, You can reconfigure stdout :
import sys
sys.stdout.reconfigure(encoding='utf-8')
Solution 3
Do not encode to utf-8; print Unicode directly instead:
print(u'♠')
See how to print Unicode to Windows console.
Comments
-
Austin A over 2 years
I'm trying to print utf-8 card symbols (♠,♥,♦︎︎,♣) from a python module to a windows console. The console that I'm using is git bash and I'm using console2 as a front-end. I've tried/read a number of approaches below and nothing has worked so far.
-
Made sure the console can handle utf-8 characters. These two tests make me believe that the console isn't the problem.
-
Attempt the same thing from the python module.
When I execute the .py, this is the result.print(u'♠') UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>
-
Attempt to encode ♠. This gives me back the unicode set encoded in utf-8, but still no spade symbol.
text = '♠' print(text.encode('utf-8')) b'\xe2\x99\xa0'
I feel like I'm missing a step or not understanding the whole encode/decode process. I've read this, this, and this. The last of the pages suggests wrapping the sys.stdout into the code but this article says using stdout is unnecessary and points to another page using the codecs module.
-
-
jfs almost 9 yearsIt is not how it works. As
win_unicode_console
demonstrates you can write any Unicode character (though only BMP characters will be display by Windows console). -
user87690 almost 9 years@J.F.Sebastian: What do you mean? With what I say you don't agree?
-
jfs almost 9 yearsyour claim is essentially that all OS I/O interfaces are bytes based.
WriteConsoleW()
(used bywin_unicode_console
) is a counter-example -
user87690 almost 9 years@J.F.Sebastian: I view
WriteConsoleW
as accepting a UTF-16-LE encoded bytes, not a string. -
jfs almost 9 yearsit is incorrect. You should separate the abstraction e.g.,
unicode
type in Python 2 and its implementation UCS-2, UCS-4 (narrow, wide builds). The implementation can be improved e.g,. Python 3 uses flexible string representation but the abstraction stays the same. In particular,WriteConsoleW()
might have started as UCS-2 but it is utf-16le now. Windows console itself is still UCS-2 i.e., you can write (and copy/paste) utf-16le but only BMP characters can be displayed (even the font support corresponding astral characters) -
user87690 almost 9 years@J.F.Sebastian: That's what I'm doing – saying that a string is an abstraction, represented by
unicode
type in Python 2, while when communicating with the OS environment, you cannot use this abstraction directly, you have to encode it somehow. Of course, the implementation of theunicode
also uses come internal encoding. -
jfs almost 9 yearsAgain, don't mix the abstraction and how it happens to be implemented. You can use Unicode string type directly. Compare how
os.listdir(unicode_string)
works on Unix (OS uses bytes interfaces) and on Windows (Unicode API). One of the improvements of Python 3 is that it uses Unicode API more on Windows. -
user87690 almost 9 years@J.F.Sebastian: By “you cannot use string directly” I meant the fact that when you want to communicate with the OS environment, you are leaving the Python realm, so you cannot use Python abstractions. So some kind of encoding is needed under the hoods. Using this obvious truth I wanted to explain what is the encoding process good for. By no means I meant that you should encode before calling
print
oros.listdir
. In fact, I meant the opposite. As I said “the encoding is done under the hoods”, “you should just callprint(a_string)
”, which is exactly what you suggest. -
jfs almost 9 yearswrong. Unicode is not Python abstraction. Maybe an example would help: let's take the number
1.5
. We can store it in memory and on disk using binary32 format:b'\x00\x00\xc0\x3f'
. If we apply your logic then there are no numeric interfaces there are only bytes interfaces. Integers are encoded as bytes too. Do you considerpid_t getpid()
to be bytes interface? Byte itself is an abstraction though it is so successful and common that is indistinguishable from reality for some people. -
user87690 almost 9 years@J.F.Sebastian: I don't claim that Unicode is a Python abstraction. I also don't claim that there are only bytes interfaces. Well, it actually depeneds on what we exactly mean. Yes, integers are encoded as bytes too. And calling a C function e.g. via ctypes needs encoding of Python integers, which are a base for abstract intergers, into bytes (of particular width and endianess), which are a level lower.
-
user87690 almost 9 years@J.F.Sebastian: Note that none of your objections is against anything I recommend to actually do in my answer. I think there is just some misunderstanding between us. Maybe we should contiue with the discussion somewhere else.
-
jfs almost 9 yearsthese messages are for your benefit. If you don't want to learn. It is fine by me.
-
user87690 almost 9 years@J.F.Sebastian: I was addressing the fact that extended discussions should be avoided in comments. And suggesting to move our discussion somewhere else.
-
JinSnow over 7 yearsPython 3.6 print('♠') (strings are UTF8 by default )
-
jfs over 7 years@Guillaume wrong. Do not confuse text represented as Unicode strings and text represented as bytes using utf-8 encoding. Pep 528 and 529 has nothing to do with it.
-
JinSnow over 7 yearsthanks for your correction! Do you mean that the default mode of python 3.6 is unicode (and not UTF8) right?
-
jfs over 7 years@Guillaume u'♠' == '♠' in Python 3.3+ or if
from __future__ import unicode_literals
is used. -
JinSnow over 7 yearsfor the (other) rookies: PEP 528 -- Change Windows console encoding to UTF-8 // PEP 529 -- Change Windows filesystem encoding to UTF-8
-
jfs over 7 years@Guillaume: to be crystal clear: utf-8 is NOT synonym for Unicode. It is just one of many character encodings. The fact that utf-8 is mentioned in both the question and the peps is a coincidence e.g.,
UnicodeEncodeError
may be fixed by installingwin-unicode-console
package that does not use utf-8 anywhere (it works with Unicode strings directly). Follow the link in the answer. -
JinSnow over 7 years@sebastian thanks to your patience I finally understood that point. I still don't understand why I can't print "ç" (french letter) in the console (U+00E7) but If I understood well, it can't be fixed (without risking to get some weird bugs). Do you confirm? (I'm running python 3.6)
-
jfs over 7 years@Guillaume no. You should be able to print any Unicode character (and even to display it correctly if you've configured the font (BMP-only)) Have you read the linked answer? If it doesn't help; create a minimal code example such as
print('\xe7')
and post it as a new Stack Overflow question with the full traceback. -
JinSnow over 7 yearsPython 3.6: "the default console on Windows accept all Unicode characters with that version" (well, most of it for me) BUT you need to configure the console: right click on the top of the windows (of the cmd or the python IDLE), in default/font choose the "Lucida console".
-
jfs over 7 years@Guillaume: if you click the only link in the answer then you should see the more detail answer. The same comment applies (to address your comment).
-
michelek about 7 yearsThank you for the final line: "Generally, for apps using the C stdio, like Python does, getting non-ASCII characters to the Windows console is just too unreliable to bother with."
-
michelek about 7 yearsIf you ever meet in person - wish I could be present :)