Wrap an open stream with io.TextIOWrapper
Solution 1
Based on multiple suggestions in various forums, and experimenting with the standard library to meet the criteria, my current conclusion is this can't be done with the library and types as we currently have them.
Solution 2
Use codecs.getreader to produce a wrapper object:
text_stream = codecs.getreader("utf-8")(bytes_stream)
Works on Python 2 and Python 3.
Solution 3
It turns out you just need to wrap your io.BytesIO
in io.BufferedReader
which exists on both Python 2 and Python 3.
import io
reader = io.BufferedReader(io.BytesIO("Lorem ipsum".encode("utf-8")))
wrapper = io.TextIOWrapper(reader)
wrapper.read() # returns Lorem ipsum
This answer originally suggested using os.pipe, but the read-side of the pipe would have to be wrapped in io.BufferedReader on Python 2 anyway to work, so this solution is simpler and avoids allocating a pipe.
Solution 4
Okay, this seems to be a complete solution, for all cases mentioned in the question, tested with Python 2.7 and Python 3.5. The general solution ended up being re-opening the file descriptor, but instead of io.BytesIO you need to use a pipe for your test double so that you have a file descriptor.
import io
import subprocess
import os
# Example function, re-opens a file descriptor for UTF-8 decoding,
# reads until EOF and prints what is read.
def read_as_utf8(fileno):
fp = io.open(fileno, mode="r", encoding="utf-8", closefd=False)
print(fp.read())
fp.close()
# Subprocess
gpg = subprocess.Popen(["gpg", "--version"], stdout=subprocess.PIPE)
read_as_utf8(gpg.stdout.fileno())
# Normal file (contains "Lorem ipsum." as UTF-8 bytes)
normal_file = open("loremipsum.txt", "rb")
read_as_utf8(normal_file.fileno()) # prints "Lorem ipsum."
# Pipe (for test harness - write whatever you want into the pipe)
pipe_r, pipe_w = os.pipe()
os.write(pipe_w, "Lorem ipsum.".encode("utf-8"))
os.close(pipe_w)
read_as_utf8(pipe_r) # prints "Lorem ipsum."
os.close(pipe_r)
Solution 5
I needed this as well, but based on the thread here, I determined that it was not possible using just Python 2's io
module. While this breaks your "Special treatment for file
" rule, the technique I went with was to create an extremely thin wrapper for file
(code below) that could then be wrapped in an io.BufferedReader
, which can in turn be passed to the io.TextIOWrapper
constructor. It will be a pain to unit test, as obviously the new code path can't be tested on Python 3.
Incidentally, the reason the results of an open()
can be passed directly to io.TextIOWrapper
in Python 3 is because a binary-mode open()
actually returns an io.BufferedReader
instance to begin with (at least on Python 3.4, which is where I was testing at the time).
import io
import six # for six.PY2
if six.PY2:
class _ReadableWrapper(object):
def __init__(self, raw):
self._raw = raw
def readable(self):
return True
def writable(self):
return False
def seekable(self):
return True
def __getattr__(self, name):
return getattr(self._raw, name)
def wrap_text(stream, *args, **kwargs):
# Note: order important here, as 'file' doesn't exist in Python 3
if six.PY2 and isinstance(stream, file):
stream = io.BufferedReader(_ReadableWrapper(stream))
return io.TextIOWrapper(stream)
At least this is small, so hopefully it minimizes the exposure for parts that cannot easily be unit tested.
Related videos on Youtube
Comments
-
bignose almost 2 years
How can I wrap an open binary stream – a Python 2
file
, a Python 3io.BufferedReader
, anio.BytesIO
– in anio.TextIOWrapper
?I'm trying to write code that will work unchanged:
- Running on Python 2.
- Running on Python 3.
- With binary streams generated from the standard library (i.e. I can't control what type they are)
- With binary streams made to be test doubles (i.e. no file handle, can't re-open).
- Producing an
io.TextIOWrapper
that wraps the specified stream.
The
io.TextIOWrapper
is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.Example
Wrapping the binary stream presented as the
subprocess.Popen.stdout
attribute:import subprocess import io gnupg_subprocess = subprocess.Popen( ["gpg", "--version"], stdout=subprocess.PIPE) gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
In unit tests, the stream is replaced with an
io.BytesIO
instance to control its content without touching any subprocesses or filesystems.gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:
[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'file' object has no attribute 'readable'
Not a solution: Special treatment for
file
An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2
file
object, and handle that differently fromio.*
objects.That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.
The unit tests will be providing test doubles, not real
file
objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.Not a solution:
io.open
Some respondents suggest re-opening (e.g. with
io.open
) the underlying file handle:gnupg_stdout = io.open( gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
That works on both Python 3 and Python 2:
[Python 3] >>> type(gnupg_subprocess.stdout) <class '_io.BufferedReader'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <class '_io.TextIOWrapper'>
[Python 2] >>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") >>> type(gnupg_stdout) <type '_io.TextIOWrapper'>
But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an
io.BytesIO
instance:>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8")) >>> type(gnupg_subprocess.stdout) <type '_io.BytesIO'> >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> io.UnsupportedOperation: fileno
Not a solution:
codecs.getreader
The standard library also has the
codecs
module, which provides wrapper features:import codecs gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
That's good because it doesn't attempt to re-open the stream. But it fails to provide the
io.TextIOWrapper
API. Specifically, it doesn't inheritio.IOBase
and doesn't have theencoding
attribute:>>> type(gnupg_subprocess.stdout) <type 'file'> >>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout) >>> type(gnupg_stdout) <type 'instance'> >>> isinstance(gnupg_stdout, io.IOBase) False >>> gnupg_stdout.encoding Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__ return getattr(self.stream, name) AttributeError: '_io.BytesIO' object has no attribute 'encoding'
So
codecs
doesn't provide objects which substitute forio.TextIOWrapper
.What to do?
So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an
io.TextIOWrapper
around the already-open byte stream?-
Dima Tisnek over 8 yearsre:
io.open
you could change unit tests, you know, e.g. atempfile.TemporaryFile()
; That's a hammer of a solution of course... -
Martijn Pieters over 8 yearsThis is a rather too limited set of restrictions. Unit tests can open files if that is absolutely the only way to properly test something, for example. So a wrapper function that can special-cases
file
objects to grab the file descriptor, can be tested with a unittest just fine.
-
bignose over 8 yearsThanks for the suggestion. That object doesn't provide enough of the
io.TextIOWrapper
API though, so isn't a solution. -
jbg over 8 yearsAh, too bad. I guess you could put your test data in a file… :/
-
bignose over 8 yearsAddressed in the question already: this needs to work also with test doubles that are not real files.
-
bignose over 8 yearsA Python 2
file
object (as created by many standard library functions) does not work when passed to theio.BufferedReader
constructor:AttributeError: 'file' object has no attribute 'readable'
. -
jbg over 8 yearsRight, I read a few more branches of the question and see what you're getting at now. As you've determined in your own answer, I don't think you can do this for Py2 and Py3 without some tests of the type of object and branching.
-
bignose over 8 yearsAlready addressed in the question: The test doubles are not real files.
io.open
won't work because the test doubles can't be re-opened by path nor file handle. -
jbg over 8 yearsAs stated in the answer, I’m addressing that by using pipes instead of BytesIO for the test doubles… or is there some reason you’re constrained to use BytesIO? It occurs to me that the very fact that BytesIO (on Python 2) isn’t enough “like” the objects you use in your real code is a good reason not to use it as a test double…
-
bignose over 8 yearsThe whole unit test suite is using
io.StringIO
andio.BytesIO
for test doubles of a great many file operations. I'm ruling out “make a special set of test doubles just for this case” as a solution; I'm looking for a solution that works with the normal fake files (those that inherit fromio.IOBase
) and the normal real files of both Python versions. -
jbg over 8 yearsYou could use the pipe code path for all situations then. Write the contents of your file, file pointer, bytesio etc to the pipe, and attach your reader to the read side, which will always be a file object.. It might be the only solution that works the right way, for all fake and real files, on both Py2 and Py3 with only one code path for all.
-
bignose over 8 yearsEventually I used a custom solution based on this. It doesn't address the requirements fully, so is not a solution; but I'm awarding the bounty as thanks for the help.
-
killthrush almost 7 yearsWorked like a charm for me. Used this technique in concert with the
csv
package andboto
to stream CSV files from S3. -
Ben over 5 yearsGiven how ill advised wrapping the GnuPG binary in subprocess and similar calls is in the first place, that's probably a good thing. Especially in something allegedly meant to be stable, production code. Now granted, the GPGME bindings hadn't been merged with GPGME's master branch when you asked this question originally, but they have now and you've still got my email B1, so if this is a thing and the focus is actually GPG rather than data streams in general; it's time to get in touch. Regards, B2. ;)