Wrap an open stream with io.TextIOWrapper

49,252

Solution 1

Based on multiple suggestions in various forums, and experimenting with the standard library to meet the criteria, my current conclusion is this can't be done with the library and types as we currently have them.

Solution 2

Use codecs.getreader to produce a wrapper object:

text_stream = codecs.getreader("utf-8")(bytes_stream)

Works on Python 2 and Python 3.

Solution 3

It turns out you just need to wrap your io.BytesIO in io.BufferedReader which exists on both Python 2 and Python 3.

import io

reader = io.BufferedReader(io.BytesIO("Lorem ipsum".encode("utf-8")))
wrapper = io.TextIOWrapper(reader)
wrapper.read()  # returns Lorem ipsum

This answer originally suggested using os.pipe, but the read-side of the pipe would have to be wrapped in io.BufferedReader on Python 2 anyway to work, so this solution is simpler and avoids allocating a pipe.

Solution 4

Okay, this seems to be a complete solution, for all cases mentioned in the question, tested with Python 2.7 and Python 3.5. The general solution ended up being re-opening the file descriptor, but instead of io.BytesIO you need to use a pipe for your test double so that you have a file descriptor.

import io
import subprocess
import os

# Example function, re-opens a file descriptor for UTF-8 decoding,
# reads until EOF and prints what is read.
def read_as_utf8(fileno):
    fp = io.open(fileno, mode="r", encoding="utf-8", closefd=False)
    print(fp.read())
    fp.close()

# Subprocess
gpg = subprocess.Popen(["gpg", "--version"], stdout=subprocess.PIPE)
read_as_utf8(gpg.stdout.fileno())

# Normal file (contains "Lorem ipsum." as UTF-8 bytes)
normal_file = open("loremipsum.txt", "rb")
read_as_utf8(normal_file.fileno())  # prints "Lorem ipsum."

# Pipe (for test harness - write whatever you want into the pipe)
pipe_r, pipe_w = os.pipe()
os.write(pipe_w, "Lorem ipsum.".encode("utf-8"))
os.close(pipe_w)
read_as_utf8(pipe_r)  # prints "Lorem ipsum."
os.close(pipe_r)

Solution 5

I needed this as well, but based on the thread here, I determined that it was not possible using just Python 2's io module. While this breaks your "Special treatment for file" rule, the technique I went with was to create an extremely thin wrapper for file (code below) that could then be wrapped in an io.BufferedReader, which can in turn be passed to the io.TextIOWrapper constructor. It will be a pain to unit test, as obviously the new code path can't be tested on Python 3.

Incidentally, the reason the results of an open() can be passed directly to io.TextIOWrapper in Python 3 is because a binary-mode open() actually returns an io.BufferedReader instance to begin with (at least on Python 3.4, which is where I was testing at the time).

import io
import six  # for six.PY2

if six.PY2:
    class _ReadableWrapper(object):
        def __init__(self, raw):
            self._raw = raw

        def readable(self):
            return True

        def writable(self):
            return False

        def seekable(self):
            return True

        def __getattr__(self, name):
            return getattr(self._raw, name)

def wrap_text(stream, *args, **kwargs):
    # Note: order important here, as 'file' doesn't exist in Python 3
    if six.PY2 and isinstance(stream, file):
        stream = io.BufferedReader(_ReadableWrapper(stream))

    return io.TextIOWrapper(stream)

At least this is small, so hopefully it minimizes the exposure for parts that cannot easily be unit tested.

Share:
49,252

Related videos on Youtube

bignose
Author by

bignose

Sorry about what happens later.

Updated on July 09, 2022

Comments

  • bignose
    bignose almost 2 years

    How can I wrap an open binary stream – a Python 2 file, a Python 3 io.BufferedReader, an io.BytesIO – in an io.TextIOWrapper?

    I'm trying to write code that will work unchanged:

    • Running on Python 2.
    • Running on Python 3.
    • With binary streams generated from the standard library (i.e. I can't control what type they are)
    • With binary streams made to be test doubles (i.e. no file handle, can't re-open).
    • Producing an io.TextIOWrapper that wraps the specified stream.

    The io.TextIOWrapper is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.

    Example

    Wrapping the binary stream presented as the subprocess.Popen.stdout attribute:

    import subprocess
    import io
    
    gnupg_subprocess = subprocess.Popen(
            ["gpg", "--version"], stdout=subprocess.PIPE)
    gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
    

    In unit tests, the stream is replaced with an io.BytesIO instance to control its content without touching any subprocesses or filesystems.

    gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
    

    That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:

    [Python 2]
    >>> type(gnupg_subprocess.stdout)
    <type 'file'>
    >>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'file' object has no attribute 'readable'
    

    Not a solution: Special treatment for file

    An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 file object, and handle that differently from io.* objects.

    That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.

    The unit tests will be providing test doubles, not real file objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.

    Not a solution: io.open

    Some respondents suggest re-opening (e.g. with io.open) the underlying file handle:

    gnupg_stdout = io.open(
            gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
    

    That works on both Python 3 and Python 2:

    [Python 3]
    >>> type(gnupg_subprocess.stdout)
    <class '_io.BufferedReader'>
    >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
    >>> type(gnupg_stdout)
    <class '_io.TextIOWrapper'>
    
    [Python 2]
    >>> type(gnupg_subprocess.stdout)
    <type 'file'>
    >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
    >>> type(gnupg_stdout)
    <type '_io.TextIOWrapper'>
    

    But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an io.BytesIO instance:

    >>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
    >>> type(gnupg_subprocess.stdout)
    <type '_io.BytesIO'>
    >>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    io.UnsupportedOperation: fileno
    

    Not a solution: codecs.getreader

    The standard library also has the codecs module, which provides wrapper features:

    import codecs
    
    gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
    

    That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapper API. Specifically, it doesn't inherit io.IOBase and doesn't have the encoding attribute:

    >>> type(gnupg_subprocess.stdout)
    <type 'file'>
    >>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
    >>> type(gnupg_stdout)
    <type 'instance'>
    >>> isinstance(gnupg_stdout, io.IOBase)
    False
    >>> gnupg_stdout.encoding
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__
        return getattr(self.stream, name)
    AttributeError: '_io.BytesIO' object has no attribute 'encoding'
    

    So codecs doesn't provide objects which substitute for io.TextIOWrapper.

    What to do?

    So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapper around the already-open byte stream?

    • Dima Tisnek
      Dima Tisnek over 8 years
      re: io.open you could change unit tests, you know, e.g. a tempfile.TemporaryFile(); That's a hammer of a solution of course...
    • Martijn Pieters
      Martijn Pieters over 8 years
      This is a rather too limited set of restrictions. Unit tests can open files if that is absolutely the only way to properly test something, for example. So a wrapper function that can special-cases file objects to grab the file descriptor, can be tested with a unittest just fine.
  • bignose
    bignose over 8 years
    Thanks for the suggestion. That object doesn't provide enough of the io.TextIOWrapper API though, so isn't a solution.
  • jbg
    jbg over 8 years
    Ah, too bad. I guess you could put your test data in a file… :/
  • bignose
    bignose over 8 years
    Addressed in the question already: this needs to work also with test doubles that are not real files.
  • bignose
    bignose over 8 years
    A Python 2 file object (as created by many standard library functions) does not work when passed to the io.BufferedReader constructor: AttributeError: 'file' object has no attribute 'readable'.
  • jbg
    jbg over 8 years
    Right, I read a few more branches of the question and see what you're getting at now. As you've determined in your own answer, I don't think you can do this for Py2 and Py3 without some tests of the type of object and branching.
  • bignose
    bignose over 8 years
    Already addressed in the question: The test doubles are not real files. io.open won't work because the test doubles can't be re-opened by path nor file handle.
  • jbg
    jbg over 8 years
    As stated in the answer, I’m addressing that by using pipes instead of BytesIO for the test doubles… or is there some reason you’re constrained to use BytesIO? It occurs to me that the very fact that BytesIO (on Python 2) isn’t enough “like” the objects you use in your real code is a good reason not to use it as a test double…
  • bignose
    bignose over 8 years
    The whole unit test suite is using io.StringIO and io.BytesIO for test doubles of a great many file operations. I'm ruling out “make a special set of test doubles just for this case” as a solution; I'm looking for a solution that works with the normal fake files (those that inherit from io.IOBase) and the normal real files of both Python versions.
  • jbg
    jbg over 8 years
    You could use the pipe code path for all situations then. Write the contents of your file, file pointer, bytesio etc to the pipe, and attach your reader to the read side, which will always be a file object.. It might be the only solution that works the right way, for all fake and real files, on both Py2 and Py3 with only one code path for all.
  • bignose
    bignose over 8 years
    Eventually I used a custom solution based on this. It doesn't address the requirements fully, so is not a solution; but I'm awarding the bounty as thanks for the help.
  • killthrush
    killthrush almost 7 years
    Worked like a charm for me. Used this technique in concert with the csv package and boto to stream CSV files from S3.
  • Ben
    Ben over 5 years
    Given how ill advised wrapping the GnuPG binary in subprocess and similar calls is in the first place, that's probably a good thing. Especially in something allegedly meant to be stable, production code. Now granted, the GPGME bindings hadn't been merged with GPGME's master branch when you asked this question originally, but they have now and you've still got my email B1, so if this is a thing and the focus is actually GPG rather than data streams in general; it's time to get in touch. Regards, B2. ;)