Python: subprocess.call, stdout to file, stderr to file, display stderr on screen in real time

59,877

Solution 1

You can do this with subprocess, but it's not trivial. If you look at the Frequently Used Arguments in the docs, you'll see that you can pass PIPE as the stderr argument, which creates a new pipe, passes one side of the pipe to the child process, and makes the other side available to use as the stderr attribute.*

So, you will need to service that pipe, writing to the screen and to the file. In general, getting the details right for this is very tricky.** In your case, there's only one pipe, and you're planning on servicing it synchronously, so it's not that bad.

import subprocess
proc = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=file_out, stderr=subprocess.PIPE)
for line in proc.stderr:
    sys.stdout.write(line)
    log_file.write(line)
proc.wait()

(Note that there are some issues using for line in proc.stderr:—basically, if what you're reading turns out not to be line-buffered for any reason, you can sit around waiting for a newline even though there's actually half a line worth of data to process. You can read chunks at a time with, say, read(128), or even read(1), to get the data more smoothly if necessary. If you need to actually get every byte as soon as it arrives, and can't afford the cost of read(1), you'll need to put the pipe in non-blocking mode and read asynchronously.)


But if you're on Unix, it might be simpler to use the tee command to do it for you.

For a quick&dirty solution, you can use the shell to pipe through it. Something like this:

subprocess.call('path_to_tool -option1 option2 2|tee log_file 1>2', shell=True,
                stdout=file_out)

But I don't want to debug shell piping; let's do it in Python, as shown in the docs:

tool = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=file_out, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', 'log_file'], stdin=tool.stderr)
tool.stderr.close()
tee.communicate()

Finally, there are a dozen or more higher-level wrappers around subprocesses and/or the shell on PyPI—sh, shell, shell_command, shellout, iterpipes, sarge, cmd_utils, commandwrapper, etc. Search for "shell", "subprocess", "process", "command line", etc. and find one you like that makes the problem trivial.


What if you need to gather both stderr and stdout?

The easy way to do it is to just redirect one to the other, as Sven Marnach suggests in a comment. Just change the Popen parameters like this:

tool = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

And then everywhere you used tool.stderr, use tool.stdout instead—e.g., for the last example:

tee = subprocess.Popen(['tee', 'log_file'], stdin=tool.stdout)
tool.stdout.close()
tee.communicate()

But this has some tradeoffs. Most obviously, mixing the two streams together means you can't log stdout to file_out and stderr to log_file, or copy stdout to your stdout and stderr to your stderr. But it also means the ordering can be non-deterministic—if the subprocess always writes two lines to stderr before writing anything to stdout, you might end up getting a bunch of stdout between those two lines once you mix the streams. And it means they have to share stdout's buffering mode, so if you were relying on the fact that linux/glibc guarantees stderr to be line-buffered (unless the subprocess explicitly changes it), that may no longer be true.


If you need to handle the two processes separately, it gets more difficult. Earlier, I said that servicing the pipe on the fly is easy as long as you only have one pipe and can service it synchronously. If you have two pipes, that's obviously no longer true. Imagine you're waiting on tool.stdout.read(), and new data comes in from tool.stderr. If there's too much data, it can cause the pipe to overflow and the subprocess to block. But even if that doesn't happen, you obviously won't be able to read and log the stderr data until something comes in from stdout.

If you use the pipe-through-tee solution, that avoids the initial problem… but only by creating a new project that's just as bad. You have two tee instances, and while you're calling communicate on one, the other one is sitting around waiting forever.

So, either way, you need some kind of asynchronous mechanism. You can do this is with threads, a select reactor, something like gevent, etc.

Here's a quick and dirty example:

proc = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def tee_pipe(pipe, f1, f2):
    for line in pipe:
        f1.write(line)
        f2.write(line)
t1 = threading.Thread(target=tee_pipe, args=(proc.stdout, file_out, sys.stdout))
t2 = threading.Thread(target=tee_pipe, args=(proc.stderr, log_file, sys.stderr))
t3 = threading.Thread(proc.wait)
t1.start(); t2.start(); t3.start()
t1.join(); t2.join(); t3.join()

However, there are some edge cases where that won't work. (The problem is the order in which SIGCHLD and SIGPIPE/EPIPE/EOF arrive. I don't think any of that will affect us here, since we're not sending any input… but don't trust me on that without thinking it through and/or testing.) The subprocess.communicate function from 3.3+ gets all the fiddly details right. But you may find it a lot simpler to use one of the async-subprocess wrapper implementations you can find on PyPI and ActiveState, or even the subprocess stuff from a full-fledged async framework like Twisted.


* The docs don't really explain what pipes are, almost as if they expect you to be an old Unix C hand… But some of the examples, especially in the Replacing Older Functions with the subprocess Module section, show how they're used, and it's pretty simple.

** The hard part is sequencing two or more pipes properly. If you wait on one pipe, the other may overflow and block, preventing your wait on the other one from ever finishing. The only easy way to get around this is to create a thread to service each pipe. (On most *nix platforms, you can use a select or poll reactor instead, but making that cross-platform is amazingly difficult.) The source to the module, especially communicate and its helpers, shows how to do it. (I linked to 3.3, because in earlier versions, communicate itself gets some important things wrong…) This is why, whenever possible, you want to use communicate if you need more than one pipe. In your case, you can't use communicate, but fortunately you don't need more than one pipe.

Solution 2

I had to make a few changes to @abarnert's answer for Python 3. This seems to work:

def tee_pipe(pipe, f1, f2):
    for line in pipe:
        f1.write(line)
        f2.write(line)

proc = subprocess.Popen(["/bin/echo", "hello"],
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE)

# Open the output files for stdout/err in unbuffered mode.
out_file = open("stderr.log", "wb", 0)
err_file = open("stdout.log", "wb", 0)

stdout = sys.stdout
stderr = sys.stderr

# On Python3 these are wrapped with BufferedTextIO objects that we don't
# want.
if sys.version_info[0] >= 3:
    stdout = stdout.buffer
    stderr = stderr.buffer

# Start threads to duplicate the pipes.
out_thread = threading.Thread(target=tee_pipe,
                              args=(proc.stdout, out_file, stdout))
err_thread = threading.Thread(target=tee_pipe,
                              args=(proc.stderr, err_file, stderr))

out_thread.start()
err_thread.start()

# Wait for the command to finish.
proc.wait()

# Join the pipe threads.
out_thread.join()
err_thread.join()

Solution 3

I think what you are looking for is something like:

import sys, subprocess
p = subprocess.Popen(cmdline,
                     stdout=sys.stdout,
                     stderr=sys.stderr)

To have the output/log written to a file I would modify my cmdline to include usual redirects, as it would be done on a plain linux bash/shell. For instance, I would append tee to the command-line: cmdline += ' | tee -a logfile.txt'

Hope that helps.

Share:
59,877
Ben S.
Author by

Ben S.

Updated on November 17, 2020

Comments

  • Ben S.
    Ben S. over 3 years

    I have a command line tool (actually, several) that I am writing a wrapper for in Python.

    The tool is generally used like this:

     $ path_to_tool -option1 -option2 > file_out
    

    The user gets the output written to file_out, and is also able to see various status messages of the tool as it is running.

    I want to replicate this behavior, while also logging stderr (the status messages) to a file.

    What I have is this:

    from subprocess import call
    call(['path_to_tool','-option1','option2'], stdout = file_out, stderr = log_file)
    

    This works fine EXCEPT that stderr is not written to the screen. I can add code to print the contents of the log_file to the screen of course, but then the user will see it after everything is done rather than while it is happening.

    To recap, desired behavior is:

    1. use call(), or subprocess()
    2. direct stdout to a file
    3. direct stderr to a file, while also writing stderr to the screen in real time as if the tool had been called directly from the command line.

    I have a feeling I'm either missing something really simple, or this is much more complicated than I thought...thanks for any help!

    EDIT: this only needs to work on Linux.

  • abarnert
    abarnert almost 11 years
    @user2063292: Sorry, that's tool and tee. Following the sample code a little too closely. :) Thanks for catching it.
  • Sven Marnach
    Sven Marnach almost 11 years
    Is 2| supposed to pipe stderr? It doesn't in POSIX shell.
  • Ben S.
    Ben S. almost 11 years
    Great thanks for this very complete answer. I'll check it off as soon as I have it working. If you have a moment, do you mind explaining what "stderr=subprocess.PIPE" does?
  • Sven Marnach
    Sven Marnach almost 11 years
    Is stderr always line-buffered? Otherwise, you would need to explicitly disable buffering to get real-time output.
  • abarnert
    abarnert almost 11 years
    @SvenMarnach: I deliberately wrote the shell=True code without thinking about it, hoping (although not verifying) there would be multiple errors in it, as a segue to showing how to do it readably in Python. Hence the "I don't want to debug shell piping" line.
  • abarnert
    abarnert almost 11 years
    @SvenMarnach: I don't know what your question means. We're not even touching sys.stderr. What we see in proc.stderr is a pipe. And what the subprocess does to its stderr is entirely up to the subprocess; we have no control over it. Which of those are you asking about?
  • Sven Marnach
    Sven Marnach almost 11 years
    You are right, I was confused (though I vaguely remember some LD_PRELOAD stuff you can use to change the buffering the subprocess uses). I was thinking of the bufsize parameter to Popen(), but this of course doesn't affect the subprocess' stderr.
  • Sven Marnach
    Sven Marnach almost 11 years
    To restate my question: The C library usually makes stdout line-buffered if it's a terminal and fully buffered if not. This means that by piping stdout to tee, you usually lose real-time output. Do you happen to know whether the same is true for stderr?
  • abarnert
    abarnert almost 11 years
    @SvenMarnach: I'm not positive. And of course there are plenty of tools that check isatty explicitly and behave differently, too. I'd suggest testing the actual tools on whatever platform(s) you care about. If worst comes to worst, you may need to create a pty, which is always fun to do in a cross-platform (I mean just cross-Unix; obviously it's not even meaningful on Windows…) way in Python.
  • Sven Marnach
    Sven Marnach almost 11 years
    Thanks for taking the time to answer my question. I actually found the answer myself now -- in man stderr: "The stream stderr is unbuffered."
  • Ben S.
    Ben S. almost 11 years
    ...and, it works. (I am using the last solution presented by abarnert to be clear).
  • abarnert
    abarnert almost 11 years
    @SvenMarnach: Yeah, I see that now in the linux/glibc stderr(3) page. I don't know if that's specified anywhere, or if it's guaranteed true in BSD or other platforms/libs, but if both you and the OP only care about linux, good enough, right?
  • Ben S.
    Ben S. almost 11 years
    Turns out I have one special case where I want to send both stdout and stderr to the logfile. Is this trivial? I gather I should have both stderr = subprocess.PIPE and stdout = subprocess.PIPE in the first Popen(), but not sure what is next...
  • Sven Marnach
    Sven Marnach almost 11 years
    @user2063292: use stdout=subprocess.PIPE and stderr=subprocess.STDOUT. Note that mixing two streams results in nondeterministic output, and stdout will probably become fully buffered. If you have control over the subprocess you call, you can disable buffering there.
  • abarnert
    abarnert almost 11 years
    @user2063292: You can create two separate pipes and read from them separately… but then you run into the problems I mentioned dealing with two pipes. I can add a bit more about that to the answer. But if acceptable to just redirect stderr to stdout as Sven Marnach suggests, it's a lot easier.
  • jfs
    jfs almost 11 years
    data = proc.stderr.read() in the first code example blocks until all data is read.
  • abarnert
    abarnert almost 11 years
    @J.F.Sebastian: Thanks for catching that. I had a read(1) originally, and an explanation of the tradeoffs for the different options… I'm not sure why I changed it. Anyway, explaining the different options and the tradeoffs is more important than picking one for the OP with no explanation…
  • Ben S.
    Ben S. almost 11 years
    @SvenMarnach and abanert The simple solution (add stderr=subprocess.STDOUT) worked very nicely, thanks again.