Python 3: How to specify stdin encoding

25,894

Python 3 does not expect ASCII from sys.stdin. It'll open stdin in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII, but that is not a given. See the sys.stdin documentation on how the codec is selected.

Like other file objects opened in text mode, the sys.stdin object derives from the io.TextIOBase base class; it has a .buffer attribute pointing to the underlying buffered IO instance (which in turn has a .raw attribute).

Wrap the sys.stdin.buffer attribute in a new io.TextIOWrapper() instance to specify a different encoding:

import io
import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

Alternatively, set the PYTHONIOENCODING environment variable to the desired codec when running python.

From Python 3.7 onwards, you can also reconfigure the existing std* wrappers, provided you do it at the start (before any data has been read):

# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')
Share:
25,894
Seppo Enarvi
Author by

Seppo Enarvi

Updated on July 09, 2022

Comments

  • Seppo Enarvi
    Seppo Enarvi almost 2 years

    While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:

    for line in sys.stdin:
        ...
    

    But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:

    UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)

    For a regular file, I would specify the encoding when opening the file:

    with open('filename', 'r', encoding='utf-8') as file:
        for line in file:
            ...
    

    But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using

    input_stream = codecs.getreader('utf-8')(sys.stdin)
    for line in input_stream:
        ...
    

    However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.

  • bukzor
    bukzor over 10 years
    What's the nearest equivalent for python2.6?
  • Martijn Pieters
    Martijn Pieters over 10 years
    @bukzor: Next option: open the file descriptor directly with io.open(); 0 is stdin: io.open(0) returns a TextIOWrapper() object.
  • bukzor
    bukzor over 10 years
    @MartijnPieters: That works pretty great! Thanks! Whole script: paste.pound-python.org/show/xoUPpsfFhtKssXBzLxBd Deleting my previous failures.
  • jfs
    jfs almost 8 years
    you could call sys.stdin.detach() instead of sys.stdin.buffer. Though a preferable solution is to leave the source code along and to configure the environment instead (locale, PYTHONIOENCODING).
  • Irshad Bhat
    Irshad Bhat almost 8 years
    If I use io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8') to read from stdin in python 2.7, it syas AttributeError: 'file' object has no attribute 'buffer'. How can one make reading from stdin compatible with both python 2 and 3.
  • Martijn Pieters
    Martijn Pieters almost 8 years
    @IrshadBhat: Did you see Wrap an open stream with io.TextIOWrapper?
  • alvas
    alvas over 6 years
    Is there a way to use sys.stdin.buffer in Python2 too?
  • Martijn Pieters
    Martijn Pieters over 6 years
    @alvas: to read binary? See Reading binary data from stdin
  • alvas
    alvas over 6 years
    I'm reading text but would like the code to support Python2 and Python3 without doing if sys version =( Specifically stackoverflow.com/questions/47425695/…
  • CMCDragonkai
    CMCDragonkai almost 6 years
    In order to only change the encoding, one should preserve the other stream parameters: io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8', errors=sys.stdin.errors, newline=sys.stdin.newlines, line_buffering=sys.stdin.line_buffering) However I haven't found a way to acquire the write_through parameter.
  • Martijn Pieters
    Martijn Pieters almost 6 years
    @CMCDragonkai: Python 3.7 adds a write_through attribute, and more importantly, lets you reconfigure the wrapper.
  • David Epstein
    David Epstein over 5 years
    @Martijn Peters: What happens under your suggestions, if, say, one in 1000 bytes has been somehow corrupted, and causes a local deviation from utf-8?
  • Martijn Pieters
    Martijn Pieters over 5 years
    @DavidEpstein: that's a rather hypothetical situation. The default error handler is strict, so an exception will be raised when you try to read data from stdin that is not proper UTF-8. Set the errors option to a different error handler to change that behaviour, but corrupted data is corrupted data.
  • David Epstein
    David Epstein over 5 years
    @Martin Pieters In articles about utf-8 it is pointed out that an advantage of utf-8 over many other encodings is that it is easy to recover from a rare error in a byte stream. So what you say sounds very useful in some circumstances (depending on the source of the input).
  • Martijn Pieters
    Martijn Pieters over 5 years
    @DavidEpstein: yes, and setting an error handler other than strict will let you skip corrupted bytes until a new valid start byte for a sequence is found. None of which has much to do with this specific answer, that is universally applicable to Python's encoding handling.
  • Suncatcher
    Suncatcher over 3 years
    sys.stdin.reconfigure(encoding='utf-8') gives AttributeError: 'StdInputFile' object has no attribute 'reconfigure' in Python 3.9.0
  • Martijn Pieters
    Martijn Pieters over 3 years
    @Suncatcher: StdInputFile is not a standard library type. The reconfigure() method only exists on the io.TextIOWrapper() class. It appears you are using Python in an IDE or other specialised environment. You'd have that issue in any Python version.
  • Suncatcher
    Suncatcher over 3 years
    I import sys and putting this line into script and calling this .py file in an IDLE based on 3.9.0 like that exec(open('C:\\script.py').read()) so I believe it is clean-green standard, I am too newbie to use any additional modules or libraries on top of standard Python
  • Martijn Pieters
    Martijn Pieters over 3 years
    @Suncatcher: IDLE is the IDE here, and has replaced the standard sys.stdout object with a custom object. That class is part of the IDLE internal implementation, not a standard library class.
  • Or b
    Or b over 2 years
    So when you reconfigure it does apply only to the current file?