How to set sys.stdout encoding in Python 3?

58,702

Solution 1

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

Solution 2

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Solution 3

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

Solution 4

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

Solution 5

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

Share:
58,702

Related videos on Youtube

Greg Hewgill
Author by

Greg Hewgill

Software geek. Twitter: @ghewgill

Updated on July 05, 2022

Comments

  • Greg Hewgill
    Greg Hewgill almost 2 years

    Setting the default output encoding in Python 2 is a well-known idiom:

    sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
    

    This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

    However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

    What is the correct way to do this in Python 3?

    • dan_waterworth
      dan_waterworth over 13 years
      If the new code doesn't work then I'd suggest you add this as a bug.
    • Tobias Kienzler
      Tobias Kienzler over 10 years
      Wow, this causes a lot of fun in an interactive shell - try sys.stdout = codecs.getwriter("hex")(sys.stdout) in ipython to see what I mean...
  • Greg Hewgill
    Greg Hewgill over 13 years
    My context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING).
  • Lennart Regebro
    Lennart Regebro over 13 years
    Yet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-)
  • theeggman85
    theeggman85 about 7 years
    Thumbs-upping mainly because PYTHONIOENCODING=utf-8 solved my problem, after many hours of searching.
  • Maggyero
    Maggyero about 6 years
    While it is true that sys.stdout is a binary file in Python 2 and a text file in Python 3, I think your Python 2 example fails because the unicode string u"ûnicöde" that gets implicitly encoded in the sys.stdout.write method has characters outside the ASCII range. If you change your LC_CTYPE, LANG or PYTHONIOENCODING environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.)
  • Dan
    Dan about 4 years
    What about if you are trying to maintain compatibility with Python 3.6?
  • sth
    sth about 4 years
    @Dan Then you can't use this
  • Dan
    Dan about 4 years
    I had inferred as much. Is there not then in your knowledge an alternative solution?
  • sth
    sth about 4 years
    @Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for?
  • Dan
    Dan about 4 years
    I appreciate your responses. I did see several other answers on the page, but I considered your answer the most straightforward and elegant, and so I was wondering if you had a similarly straightforward/elegant alternative for python < 3.6. Thanks for your time.
  • Marc B. Hankin
    Marc B. Hankin over 2 years
    I'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing?

Related