How can I send the stdout of one process to multiple processes using (preferably unnamed) pipes in Unix (or Windows)?

30,609

Solution 1

Editor's note:
- >(…) is a process substitution that is a nonstandard shell feature of some POSIX-compatible shells: bash, ksh, zsh.
- This answer accidentally sends the output process substitution's output through the pipeline too: echo 123 | tee >(tr 1 a) | tr 1 b.
- Output from the process substitutions will be unpredictably interleaved, and, except in zsh, the pipeline may terminate before the commands inside >(…) do.

In unix (or on a mac), use the tee command:

$ echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
b23
a23

Usually you would use tee to redirect output to multiple files, but using >(...) you can redirect to another process. So, in general,

$ proc1 | tee >(proc2) ... >(procN-1) >(procN) >/dev/null

will do what you want.

Under windows, I don't think the built-in shell has an equivalent. Microsoft's Windows PowerShell has a tee command though.

Solution 2

Like dF said, bash allows to use the >(…) construct running a command in place of a filename. (There is also the <(…) construct to substitute the output of another command in place of a filename, but that is irrelevant now, I mention it just for completeness).

If you don't have bash, or running on a system with an older version of bash, you can do manually what bash does, by making use of FIFO files.

The generic way to achieve what you want, is:

  • decide how many processes should receive the output of your command, and create as many FIFOs, preferably on a global temporary folder:
    subprocesses="a b c d"
    mypid=$$
    for i in $subprocesses # this way we are compatible with all sh-derived shells  
    do
        mkfifo /tmp/pipe.$mypid.$i
    done
  • start all your subprocesses waiting input from the FIFOs:
    for i in $subprocesses
    do
        tr 1 $i </tmp/pipe.$mypid.$i & # background!
    done
  • execute your command teeing to the FIFOs:
    proc1 | tee $(for i in $subprocesses; do echo /tmp/pipe.$mypid.$i; done)
  • finally, remove the FIFOs:
    for i in $subprocesses; do rm /tmp/pipe.$mypid.$i; done

NOTE: for compatibility reasons, I would do the $(…) with backquotes, but I couldn't do it writing this answer (the backquote is used in SO). Normally, the $(…) is old enough to work even in old versions of ksh, but if it doesn't, enclose the part in backquotes.

Solution 3

Unix (bash, ksh, zsh)

dF.'s answer contains the seed of an answer based on tee and output process substitutions
(>(...)) that may or may not work, depending on your requirements:

Note that process substitutions are a nonstandard feature that (mostly) POSIX-features-only shells such as dash (which acts as /bin/sh on Ubuntu, for instance), do not support. Shell scripts targeting /bin/sh should not rely on them.

echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null

The pitfalls of this approach are:

  • unpredictable, asynchronous output behavior: the output streams from the commands inside the output process substitutions >(...) interleave in unpredictable ways.

  • In bash and ksh (as opposed to zsh - but see exception below):

    • output may arrive after the command has finished.
    • subsequent commands may start executing before the commands in the process substitutions have finished - bash and ksh do not wait for the output process substitution-spawned processes to finish, at least by default.
    • jmb puts it well in a comment on dF.'s answer:

be aware that the commands started inside >(...) are dissociated from the original shell, and you can't easily determine when they finish; the tee will finish after writing everything, but the substituted processes will still be consuming the data from various buffers in the kernel and file I/O, plus whatever time is taken by their internal handling of data. You can encounter race conditions if your outer shell then goes on to rely on anything produced by the sub-processes.

  • zsh is the only shell that does by default wait for the processes run in the output process substitutions to finish, except if it is stderr that is redirected to one (2> >(...)).

  • ksh (at least as of version 93u+) allows use of argument-less wait to wait for the output process substitution-spawned processes to finish.
    Note that in an interactive session that could result in waiting for any pending background jobs too, however.

  • bash v4.4+ can wait for the most recently launched output process substitution with wait $!, but argument-less wait does not work, making this unsuitable for a command with multiple output process substitutions.

  • However, bash and ksh can be forced to wait by piping the command to | cat, but note that this makes the command run in a subshell. Caveats:

    • ksh (as of ksh 93u+) doesn't support sending stderr to an output process substitution (2> >(...)); such an attempt is silently ignored.

    • While zsh is (commendably) synchronous by default with the (far more common) stdout output process substitutions, even the | cat technique cannot make them synchronous with stderr output process substitutions (2> >(...)).

  • However, even if you ensure synchronous execution, the problem of unpredictably interleaved output remains.

The following command, when run in bash or ksh, illustrates the problematic behaviors (you may have to run it several times to see both symptoms): The AFTER will typically print before output from the output substitutions, and the output from the latter can be interleaved unpredictably.

printf 'line %s\n' {1..30} | tee >(cat -n) >(cat -n) >/dev/null; echo AFTER

In short:

  • Guaranteeing a particular per-command output sequence:

    • Neither bash nor ksh nor zsh support that.
  • Synchronous execution:

    • Doable, except with stderr-sourced output process substitutions:
      • In zsh, they're invariably asynchronous.
      • In ksh, they don't work at all.

If you can live with these limitations, using output process substitutions is a viable option (e.g., if all of them write to separate output files).


Note that tzot's much more cumbersome, but potentially POSIX-compliant solution also exhibits unpredictable output behavior; however, by using wait you can ensure that subsequent commands do not start executing until all background processes have finished.
See bottom for a more robust, synchronous, serialized-output implementation.


The only straightforward bash solution with predictable output behavior is the following, which, however, is prohibitively slow with large input sets, because shell loops are inherently slow.
Also note that this alternates the output lines from the target commands.

while IFS= read -r line; do 
  tr 1 a <<<"$line"
  tr 1 b <<<"$line"
done < <(echo '123')

Unix (using GNU Parallel)

Installing GNU parallel enables a robust solution with serialized (per-command) output that additionally allows parallel execution:

$ echo '123' | parallel --pipe --tee {} ::: 'tr 1 a' 'tr 1 b'
a23
b23

parallel by default ensures that output from the different commands doesn't interleave (this behavior can be modified - see man parallel).

Note: Some Linux distros come with a different parallel utility, which won't work with the command above; use parallel --version to determine which one, if any, you have.


Windows

Jay Bazuzi's helpful answer shows how to do it in PowerShell. That said: his answer is the analog of the looping bash answer above, it will be prohibitively slow with large input sets and also alternates the output lines from the target commands.



bash-based, but otherwise portable Unix solution with synchronous execution and output serialization

The following is a simple, but reasonably robust implementation of the approach presented in tzot's answer that additionally provides:

  • synchronous execution
  • serialized (grouped) output

While not strictly POSIX compliant, because it is a bash script, it should be portable to any Unix platform that has bash.

Note: You can find a more full-fledged implementation released under the MIT license in this Gist.

If you save the code below as script fanout, make it executable and put int your PATH, the command from the question would work as follows:

$ echo 123 | fanout 'tr 1 a' 'tr 1 b'
# tr 1 a
a23
# tr 1 b
b23

fanout script source code:

#!/usr/bin/env bash

# The commands to pipe to, passed as a single string each.
aCmds=( "$@" )

# Create a temp. directory to hold all FIFOs and captured output.
tmpDir="${TMPDIR:-/tmp}/$kTHIS_NAME-$$-$(date +%s)-$RANDOM"
mkdir "$tmpDir" || exit
# Set up a trap that automatically removes the temp dir. when this script
# exits.
trap 'rm -rf "$tmpDir"' EXIT 

# Determine the number padding for the sequential FIFO / output-capture names, 
# so that *alphabetic* sorting, as done by *globbing* is equivalent to
# *numerical* sorting.
maxNdx=$(( $# - 1 ))
fmtString="%0${#maxNdx}d"

# Create the FIFO and output-capture filename arrays
aFifos=() aOutFiles=()
for (( i = 0; i <= maxNdx; ++i )); do
  printf -v suffix "$fmtString" $i
  aFifos[i]="$tmpDir/fifo-$suffix"
  aOutFiles[i]="$tmpDir/out-$suffix"
done

# Create the FIFOs.
mkfifo "${aFifos[@]}" || exit

# Start all commands in the background, each reading from a dedicated FIFO.
for (( i = 0; i <= maxNdx; ++i )); do
  fifo=${aFifos[i]}
  outFile=${aOutFiles[i]}
  cmd=${aCmds[i]}
  printf '# %s\n' "$cmd" > "$outFile"
  eval "$cmd" < "$fifo" >> "$outFile" &
done

# Now tee stdin to all FIFOs.
tee "${aFifos[@]}" >/dev/null || exit

# Wait for all background processes to finish.
wait

# Print all captured stdout output, grouped by target command, in sequences.
cat "${aOutFiles[@]}"

Solution 4

Since @dF: mentioned that PowerShell has tee, I thought I'd show a way to do this in PowerShell.

PS > "123" | % { 
    $_.Replace( "1", "a"), 
    $_.Replace( "2", "b" ) 
}

a23
1b3

Note that each object coming out of the first command is processed before the next object is created. This can allow scaling to very large inputs.

Solution 5

You can also save the output in a variable and use that for the other processes:

out=$(proc1); echo "$out" | proc2; echo "$out" | proc3

However, that works only if

  1. proc1 terminates at some point :-)
  2. proc1 doesn't produce too much output (don't know what the limits are there but it's probably your RAM)

But it is easy to remember and leaves you with more options on the output you get from the processes you spawned there, e. g.:

out=$(proc1); echo $(echo "$out" | proc2) / $(echo "$out" | proc3) | bc

I had difficulties doing something like that with the | tee >(proc2) >(proc3) >/dev/null approach.

Share:
30,609

Related videos on Youtube

secr
Author by

secr

Updated on June 04, 2020

Comments

  • secr
    secr almost 4 years

    I'd like to redirect the stdout of process proc1 to two processes proc2 and proc3:

             proc2 -> stdout
           /
     proc1
           \ 
             proc3 -> stdout
    

    I tried

     proc1 | (proc2 & proc3)
    

    but it doesn't seem to work, i.e.

     echo 123 | (tr 1 a & tr 1 b)
    

    writes

     b23
    

    to stdout instead of

     a23
     b23
    
  • secr
    secr over 15 years
    Thank you very much. That >(...) - concept is new to me. Note: It doesn't seem to work under Window's cmd, PowerShell and not even cygwin's bash.
  • dF.
    dF. over 15 years
    I'm not sure how to do it in PowerShell... It doesn't work in cygwin's bash for me either, I wonder whether it's a known limitation or a bug!
  • pixelbeat
    pixelbeat over 15 years
    This is not a POSIX construct and requires bash or ksh. You're out of luck with tcsh and dash etc.
  • Ishbir
    Ishbir over 15 years
    @pixelbeat: …but it can be broken down to POSIX constructs (see my answer :)
  • Harvey
    Harvey almost 15 years
    I tried applying this where one of the destinations is standard output. I came up with this: "command | tee >(cat - > ./log.txt)". I wonder if there is a better way.
  • Dejay Clayton
    Dejay Clayton over 12 years
    This doesn't do exactly what @secr requested. tee will append the output of a process redirection onto stdout before sending it through the pipe, which is distinctly different than piping the same instance of stdout to multiple commands. @dF, for example, echo 123 | tee >(tr 1 a) | tr 2 b will result in 1b3 ab3, which makes no sense in the context of the original question.
  • Matt Curtis
    Matt Curtis about 12 years
    @Harvey If I understand your question: you could use command | tee log.txt and it will send output to stdout and log.txt. tee writes to stdout and files, the >() trick is to send to a process.
  • jmb
    jmb over 11 years
    While very handy, be aware that the commands started inside >(...) are dissociated from the original shell, and you can't easily determine when they finish; the tee will finish after writing everything, but the substituted processes will still be consuming the data from various buffers in the kernel and file I/O, plus whatever time is taken by their internal handling of data. You can encounter race conditions if your outer shell then goes on to rely on anything produced by the sub-processes.
  • naumcho
    naumcho about 11 years
    Doesn't work when I tried it with a bigger example: cat /usr/share/dict/words | tee >(head -1) | tail -1 tee: /dev/fd/63: Broken pipe ZZZ
  • ack
    ack almost 11 years
    @Dejay Clayton: You can discard the original input by using inproc | tee >(outproc1) >(outproc2) > /dev/null | outproc. outproc will see only the output produced by outproc1 and outproc2. The original input is 'gone'.
  • ack
    ack almost 11 years
    Adding to the above: This works because process substitution happens before redirection (undocumented AFAICT), so outproc1 and outproc2 get what used to be tee's 'own' stdout before it gets redirected to the bit bucket.
  • Jerry Jeremiah
    Jerry Jeremiah about 10 years
    What shell does this work on? All it does it eval echo 123 |{tr 1 a,tr 1 b} which complains that {tr doesn't exist and if you put in extra spaces it waits for extra input because of the comma and if you change the comma to a semicolon or ampersand you get just the first one printed - not both.
  • mklement0
    mklement0 almost 7 years
    @JerryJeremiah: It works in shells that brace expansion (bash, ksh, zsh) by creating the command line echo '&& echo 123 |'{'tr 1 a','tr 1 b'} | sed -n 's/^&&//gp' in a string and then passing that string to eval. That said, it (a) creates 3 subshells in the process of constructing the string (1 for the `...`, and 2 for the segments of the embedded pipeline, and (b), more importantly, it duplicates the input command so that a separate copy is run for each target tr command. Aside from inefficiency, the same command run twice doesn't necessarily yield the same output twice.
  • mklement0
    mklement0 almost 7 years
    Yes, but this is the equivalent of doing while IFS= read -r line; do tr 1 a <<<"$line"; tr 1 b <<<"$line"; done < <(echo '123') in Bash, which scales well memory-wise, but not performance-wise.
  • mklement0
    mklement0 almost 7 years
    ++ for a great approach, but you should use mkfifo rather than mknod, because only the former is POSIX-compliant. Also, the use of unquoted command substitutions is brittle, and there's potential for using globbing for efficiency. I took the liberty of implementing a more robust solution - albeit bash-based - in my answer. Note that $(…) has been part of POSIX for a long time now, so I'd stay away from the less predictable `…` (and SO definitely allows use of ` in code blocks and even in-line code spans (at least now :)).
  • Russell Speight
    Russell Speight over 4 years
    It seems the write-side will block if one of the read-side processes stops consuming (i.e. fails to start, dies, etc). Something to consider when thinking about the necessary resilience of your solution.