How to set sys.stdout encoding in Python 3?
Solution 1
Since Python 3.7 you can change the encoding of standard streams with reconfigure()
:
sys.stdout.reconfigure(encoding='utf-8')
You can also modify how encoding errors are handled by adding an errors
parameter.
Solution 2
Python 3.1 added io.TextIOBase.detach()
, with a note in the documentation for sys.stdout
:
The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to
stdout
, usesys.stdout.buffer.write(b'abc')
. Usingio.TextIOBase.detach()
streams can be made binary by default. This function setsstdin
andstdout
to binary:def make_streams_binary(): sys.stdin = sys.stdin.detach() sys.stdout = sys.stdout.detach()
Therefore, the corresponding idiom for Python 3.1 and later is:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Solution 3
I found this thread while searching for solutions to the same error,
An alternative solution to those already suggested is to set the PYTHONIOENCODING
environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout
after Python is initialized:
PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py
With the advantage of not having to go and edit the Python code.
Solution 4
Other answers seem to recommend using codecs
, but open
works for me:
import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())
This works even when I run it with PYTHONIOENCODING="ascii"
.
Solution 5
Setting the default output encoding in Python 2 is a well-known idiom
Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.
It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.
CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write
to send bytes directly. Encoding page content to match its charset
parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print
is no good for CGI any more.
(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)
Related videos on Youtube
Comments
-
Greg Hewgill almost 2 years
Setting the default output encoding in Python 2 is a well-known idiom:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This wraps the
sys.stdout
object in a codec writer that encodes output in UTF-8.However, this technique does not work in Python 3 because
sys.stdout.write()
expects astr
, but the result of encoding isbytes
, and an error occurs whencodecs
tries to write the encoded bytes to the originalsys.stdout
.What is the correct way to do this in Python 3?
-
dan_waterworth over 13 yearsIf the new code doesn't work then I'd suggest you add this as a bug.
-
Tobias Kienzler over 10 yearsWow, this causes a lot of fun in an interactive shell - try
sys.stdout = codecs.getwriter("hex")(sys.stdout)
inipython
to see what I mean...
-
-
Greg Hewgill over 13 yearsMy context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING).
-
Lennart Regebro over 13 yearsYet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-)
-
theeggman85 about 7 yearsThumbs-upping mainly because PYTHONIOENCODING=utf-8 solved my problem, after many hours of searching.
-
Maggyero about 6 yearsWhile it is true that
sys.stdout
is a binary file in Python 2 and a text file in Python 3, I think your Python 2 example fails because the unicode stringu"ûnicöde"
that gets implicitly encoded in thesys.stdout.write
method has characters outside the ASCII range. If you change yourLC_CTYPE
,LANG
orPYTHONIOENCODING
environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.) -
Dan about 4 yearsWhat about if you are trying to maintain compatibility with Python 3.6?
-
sth about 4 years@Dan Then you can't use this
-
Dan about 4 yearsI had inferred as much. Is there not then in your knowledge an alternative solution?
-
sth about 4 years@Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for?
-
Dan about 4 yearsI appreciate your responses. I did see several other answers on the page, but I considered your answer the most straightforward and elegant, and so I was wondering if you had a similarly straightforward/elegant alternative for python < 3.6. Thanks for your time.
-
Marc B. Hankin over 2 yearsI'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing?