Python 3: How to specify stdin encoding
Python 3 does not expect ASCII from sys.stdin
. It'll open stdin
in text mode and make an educated guess as to what encoding is used. That guess may come down to ASCII
, but that is not a given. See the sys.stdin
documentation on how the codec is selected.
Like other file objects opened in text mode, the sys.stdin
object derives from the io.TextIOBase
base class; it has a .buffer
attribute pointing to the underlying buffered IO instance (which in turn has a .raw
attribute).
Wrap the sys.stdin.buffer
attribute in a new io.TextIOWrapper()
instance to specify a different encoding:
import io
import sys
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
Alternatively, set the PYTHONIOENCODING
environment variable to the desired codec when running python.
From Python 3.7 onwards, you can also reconfigure the existing std*
wrappers, provided you do it at the start (before any data has been read):
# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')
Seppo Enarvi
Updated on July 09, 2022Comments
-
Seppo Enarvi almost 2 years
While porting code from Python 2 to Python 3, I run into this problem when reading UTF-8 text from standard input. In Python 2, this works fine:
for line in sys.stdin: ...
But Python 3 expects ASCII from sys.stdin, and if there are non-ASCII characters in the input, I get the error:
UnicodeDecodeError: 'ascii' codec can't decode byte .. in position ..: ordinal not in range(128)
For a regular file, I would specify the encoding when opening the file:
with open('filename', 'r', encoding='utf-8') as file: for line in file: ...
But how can I specify the encoding for standard input? Other SO posts (e.g. How to change the stdin encoding on python) have suggested using
input_stream = codecs.getreader('utf-8')(sys.stdin) for line in input_stream: ...
However, this doesn't work in Python 3. I still get the same error message. I'm using Ubuntu 12.04.2 and my locale is set to en_US.UTF-8.
-
bukzor over 10 yearsWhat's the nearest equivalent for python2.6?
-
Martijn Pieters over 10 years@bukzor: Next option: open the file descriptor directly with
io.open()
;0
isstdin
:io.open(0)
returns aTextIOWrapper()
object. -
bukzor over 10 years@MartijnPieters: That works pretty great! Thanks! Whole script: paste.pound-python.org/show/xoUPpsfFhtKssXBzLxBd Deleting my previous failures.
-
jfs almost 8 yearsyou could call
sys.stdin.detach()
instead ofsys.stdin.buffer
. Though a preferable solution is to leave the source code along and to configure the environment instead (locale, PYTHONIOENCODING). -
Irshad Bhat almost 8 yearsIf I use
io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
to read fromstdin
in python 2.7, it syasAttributeError: 'file' object has no attribute 'buffer'
. How can one make reading fromstdin
compatible with both python 2 and 3. -
Martijn Pieters almost 8 years@IrshadBhat: Did you see Wrap an open stream with io.TextIOWrapper?
-
alvas over 6 yearsIs there a way to use
sys.stdin.buffer
in Python2 too? -
Martijn Pieters over 6 years@alvas: to read binary? See Reading binary data from stdin
-
alvas over 6 yearsI'm reading text but would like the code to support Python2 and Python3 without doing if sys version =( Specifically stackoverflow.com/questions/47425695/…
-
CMCDragonkai almost 6 yearsIn order to only change the encoding, one should preserve the other stream parameters:
io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8', errors=sys.stdin.errors, newline=sys.stdin.newlines, line_buffering=sys.stdin.line_buffering)
However I haven't found a way to acquire thewrite_through
parameter. -
Martijn Pieters almost 6 years@CMCDragonkai: Python 3.7 adds a
write_through
attribute, and more importantly, lets you reconfigure the wrapper. -
David Epstein over 5 years@Martijn Peters: What happens under your suggestions, if, say, one in 1000 bytes has been somehow corrupted, and causes a local deviation from utf-8?
-
Martijn Pieters over 5 years@DavidEpstein: that's a rather hypothetical situation. The default error handler is
strict
, so an exception will be raised when you try to read data from stdin that is not proper UTF-8. Set theerrors
option to a different error handler to change that behaviour, but corrupted data is corrupted data. -
David Epstein over 5 years@Martin Pieters In articles about utf-8 it is pointed out that an advantage of utf-8 over many other encodings is that it is easy to recover from a rare error in a byte stream. So what you say sounds very useful in some circumstances (depending on the source of the input).
-
Martijn Pieters over 5 years@DavidEpstein: yes, and setting an error handler other than
strict
will let you skip corrupted bytes until a new valid start byte for a sequence is found. None of which has much to do with this specific answer, that is universally applicable to Python's encoding handling. -
Suncatcher over 3 years
sys.stdin.reconfigure(encoding='utf-8')
gives AttributeError: 'StdInputFile' object has no attribute 'reconfigure' in Python 3.9.0 -
Martijn Pieters over 3 years@Suncatcher:
StdInputFile
is not a standard library type. Thereconfigure()
method only exists on theio.TextIOWrapper()
class. It appears you are using Python in an IDE or other specialised environment. You'd have that issue in any Python version. -
Suncatcher over 3 yearsI import
sys
and putting this line into script and calling this.py
file in an IDLE based on 3.9.0 like thatexec(open('C:\\script.py').read())
so I believe it is clean-green standard, I am too newbie to use any additional modules or libraries on top of standard Python -
Martijn Pieters over 3 years@Suncatcher: IDLE is the IDE here, and has replaced the standard
sys.stdout
object with a custom object. That class is part of the IDLE internal implementation, not a standard library class. -
Or b over 2 yearsSo when you
reconfigure
it does apply only to the current file?