How do you use the command coproc in various shells?

33,787

Solution 1

co-processes are a ksh feature (already in ksh88). zsh has had the feature from the start (early 90s), while it has just only been added to bash in 4.0 (2009).

However, the behaviour and interface is significantly different between the 3 shells.

The idea is the same, though: it allows to start a job in background and being able to send it input and read its output without having to resort to named pipes.

That is done with unnamed pipes with most shells and socketpairs with recent versions of ksh93 on some systems.

In a | cmd | b, a feeds data to cmd and b reads its output. Running cmd as a co-process allows the shell to be both a and b.

ksh co-processes

In ksh, you start a coprocess as:

cmd |&

You feed data to cmd by doing things like:

echo test >&p

or

print -p test

And read cmd's output with things like:

read var <&p

or

read -p var

cmd is started as any background job, You can use fg, bg, kill on it and refer it by %job-number or via $!.

To close the writing end of the pipe cmd is reading from, you can do:

exec 3>&p 3>&-

And to close the reading end of the other pipe (the one cmd is writing to):

exec 3<&p 3<&-

You cannot start a second co-process unless you first save the pipe file descriptors to some other fds. For instance:

tr a b |&
exec 3>&p 4<&p
tr b c |&
echo aaa >&3
echo bbb >&p

zsh co-processes

In zsh, co-processes are nearly identical to those in ksh. The only real difference is that zsh co-processes are started with the coproc keyword.

coproc cmd
echo test >&p
read var <&p
print -p test
read -p var

Doing:

exec 3>&p

Note: This doesn't move the coproc file descriptor to fd 3 (like in ksh), but duplicates it. So, there's no explicit way to close the feeding or reading pipe, other starting another coproc.

For instance, to close the feeding end:

coproc tr a b
echo aaaa >&p # send some data

exec 4<&p     # preserve the reading end on fd 4
coproc :      # start a new short-lived coproc (runs the null command)

cat <&4       # read the output of the first coproc

In addition to pipe based co-processes, zsh (since 3.1.6-dev19, released in 2000) has pseudo-tty based constructs like expect. To interact with most programs, ksh-style co-processes won't work, since programs start buffering when their output is a pipe.

Here are some examples.

Start the co-process x:

zmodload zsh/zpty
zpty x cmd

(Here, cmd is a simple command. But you can do fancier things with eval or functions.)

Feed a co-process data:

zpty -w x some data

Read co-process data (in the simplest case):

zpty -r x var

Like expect, it can wait for some output from the co-process matching a given pattern.

bash co-processes

The bash syntax is a lot newer, and builds on top of a new feature recently added to ksh93, bash, and zsh that provides a syntax to allow handling of dynamically-allocated file descriptors above 10.

bash offers a basic coproc syntax, and an extended one.

Basic syntax

The basic syntax for starting a co-process looks like zsh's:

coproc cmd

In ksh or zsh, the pipes to and from the co-process are accessed with >&p and <&p.

But in bash, the file descriptors of the pipe from the co-process and the other pipe to the co-proccess are returned in the $COPROC array (respectively ${COPROC[0]} and ${COPROC[1]}. So…

Feed data to the co-process:

echo xxx >&"${COPROC[1]}"

Read data from the co-process:

read var <&"${COPROC[0]}"

With the basic syntax, you can start only one co-process at the time.

Extended syntax

In the extended syntax, you can name your co-processes (like in zsh zpty co-proccesses):

coproc mycoproc { cmd; }

The command has to be a compound command. (Notice how the example above is reminiscent of function f { ...; }.)

This time, the file descriptors are in ${mycoproc[0]} and ${mycoproc[1]}.

You can start more than one co-process at a time—but you do get a warning when you start a co-process while one is still running (even in non-interactive mode).

You can close the file descriptors when using the extended syntax.

coproc tr { tr a b; }
echo aaa >&"${tr[1]}"

exec {tr[1]}>&-

cat <&"${tr[0]}"

Note that closing that way doesn't work in bash versions prior to 4.3 where you have to write it instead:

fd=${tr[1]}
exec {fd}>&-

As in ksh and zsh, those pipe file descriptors are marked as close-on-exec.

But in bash, the only way to pass those to executed commands is to duplicate them to fds 0, 1, or 2. That limits the number of co-processes you can interact with for a single command. (See below for an example.)

yash process and pipeline redirection

yash doesn't have a co-process feature per se, but the same concept can be implemented with its pipeline and process redirection features. yash has an interface to the pipe() system call, so this kind of thing can be done relatively easily by hand there.

You'd start a co-process with:

exec 5>>|4 3>(cmd >&5 4<&- 5>&-) 5>&-

Which first creates a pipe(4,5) (5 the writing end, 4 the reading end), then redirects fd 3 to a pipe to a process that runs with its stdin at the other end, and stdout going to the pipe created earlier. Then we close the writing end of that pipe in the parent which we won't need. So now in the shell we have fd 3 connected to the cmd's stdin and fd 4 connected to cmd's stdout with pipes.

Note that the close-on-exec flag is not set on those file descriptors.

To feed data:

echo data >&3 4<&-

To read data:

read var <&4 3>&-

And you can close fds as usual:

exec 3>&- 4<&-

hardly any benefit over using named pipes

Co-processes can easily be implemented with standard named pipes. I don't know when exactly named pipes were introduced but it's possible it was after ksh came up with co-processes (probably in the mid 80s, ksh88 was "released" in 88, but I believe ksh was used internally at AT&T a few years before that) which would explain why.

cmd |&
echo data >&p
read var <&p

Can be written with:

mkfifo in out

cmd <in >out &
exec 3> in 4< out
echo data >&3
read var <&4

Interacting with those is more straightforward—especially if you need to run more than one co-process. (See examples below.)

The only benefit of using coproc is that you don't have to clean up of those named pipes after use.

deadlock-prone

Shells use pipes in a few constructs:

  • shell pipes: cmd1 | cmd2,
  • command substitution: $(cmd),
  • and process substitution: <(cmd), >(cmd).

In those, the data flows in only one direction between different processes.

With co-processes and named pipes, though, it's easy to run into deadlock. You have to keep track of which command has which file descriptor open, to prevent one staying open and holding a process alive. Deadlocks can be tricky to investigate, because they may occur non-deterministically; for instance, only when as much data as to fill one pipe up is sent.

works worse than expect for what it's been designed for

The main purpose of co-processes was to provide the shell with a way to interact with commands. However, it does not work so well.

The simplest form of deadlock mentioned above is:

tr a b |&
echo a >&p
read var<&p

Because its output doesn't go to a terminal, tr buffers its output. So it won't output anything until either it sees end-of-file on its stdin, or it has accumulated a buffer-full of data to output. So above, after the shell has output a\n (only 2 bytes), the read will block indefinitely because tr is waiting for the shell to send it more data.

In short, pipes aren't good for interacting with commands. Co-processes can only be used to interact with commands that don't buffer their output, or commands which can be told not to buffer their output; for example, by using stdbuf with some commands on recent GNU or FreeBSD systems.

That's why expect or zpty use pseudo-terminals instead. expect is a tool designed for interacting with commands, and it does it well.

File descriptor handling is fiddly, and hard to get right

Co-processes can be used to do some more complex plumbing than what simple shell pipes allow.

that other Unix.SE answer has an example of a coproc usage.

Here's a simplified example: Imagine you want a function that feeds a copy of a command's output to 3 other commands, and then have the output of those 3 commands get concatenated.

All using pipes.

For instance: feed the output of printf '%s\n' foo bar to tr a b, sed 's/./&&/g', and cut -b2- to obtain something like:

foo
bbr
ffoooo
bbaarr
oo
ar

First, it's not necessarily obvious, but there’s a possibility for deadlock there, and it will start to happen after only a few kilobytes of data.

Then, depending on your shell, you’ll run in a number of different problems that have to be addressed differently.

For instance, with zsh, you'd do it with:

f() (
  coproc tr a b
  exec {o1}<&p {i1}>&p
  coproc sed 's/./&&/g' {i1}>&- {o1}<&-
  exec {o2}<&p {i2}>&p
  coproc cut -c2- {i1}>&- {o1}<&- {i2}>&- {o2}<&-
  tee /dev/fd/$i1 /dev/fd/$i2 >&p {o1}<&- {o2}<&- &
  exec cat /dev/fd/$o1 /dev/fd/$o2 - <&p {i1}>&- {i2}>&-
)
printf '%s\n' foo bar | f

Above, the co-process fds have the close-on-exec flag set, but not the ones that are duplicated from them (as in {o1}<&p). So, to avoid deadlocks, you’ll have to make sure they're closed in any processes that don't need them.

Similarly, we have to use a subshell and use exec cat in the end, to ensure there's no shell process lying about holding a pipe open.

With ksh (here ksh93), that would have to be:

f() (
  tr a b |&
  exec {o1}<&p {i1}>&p
  sed 's/./&&/g' |&
  exec {o2}<&p {i2}>&p
  cut -c2- |&
  exec {o3}<&p {i3}>&p
  eval 'tee "/dev/fd/$i1" "/dev/fd/$i2"' >&"$i3" {i1}>&"$i1" {i2}>&"$i2" &
  eval 'exec cat "/dev/fd/$o1" "/dev/fd/$o2" -' <&"$o3" {o1}<&"$o1" {o2}<&"$o2"
)
printf '%s\n' foo bar | f

(Note: That won’t work on systems where ksh uses socketpairs instead of pipes, and where /dev/fd/n works like on Linux.)

In ksh, fds above 2 are marked with the close-on-exec flag, unless they’re passed explicitly on the command line. That’s why we don't have to close the unused file descriptors like with zsh—but it’s also why we have to do {i1}>&$i1 and use eval for that new value of $i1, to be passed to tee and cat

In bash this cannot be done, because you can't avoid the close-on-exec flag.

Above, it's relatively simple, because we use only simple external commands. It gets more complicated when you want to use shell constructs in there instead, and you start running into shell bugs.

Compare the above with the same using named pipes:

f() {
  mkfifo p{i,o}{1,2,3}
  tr a b < pi1 > po1 &
  sed 's/./&&/g' < pi2 > po2 &
  cut -c2- < pi3 > po3 &

  tee pi{1,2} > pi3 &
  cat po{1,2,3}
  rm -f p{i,o}{1,2,3}
}
printf '%s\n' foo bar | f

Conclusion

If you want to interact with a command, use expect, or zsh's zpty, or named pipes.

If you want to do some fancy plumbing with pipes, use named pipes.

Co-processes can do some of the above, but be prepared to do some serious head scratching for anything non-trivial.

Solution 2

Co-processes were first introduced in a shell scripting language with the ksh88 shell (1988), and later in zsh at some point before 1993.

The syntax to launch a co-process under ksh is command |&. Starting from there, you can write to command standard input with print -p and read its standard output with read -p.

More than a couple of decades later, bash which was lacking this feature finally introduced it in its 4.0 release. Unfortunately, an incompatible and more complex syntax was selected.

Under bash 4.0 and newer, you can launch a co-process with the coproc command, eg:

$ coproc awk '{print $2;fflush();}'

You can then pass something to the command stdin that way:

$ echo one two three >&${COPROC[1]}

and read awk output with:

$ read -ru ${COPROC[0]} foo
$ echo $foo
two

Under ksh, that would have been:

$ awk '{print $2;fflush();}' |&
$ print -p "one two three"
$ read -p foo
$ echo $foo
two

Solution 3

Here is another good (and working) example -- a simple server written in BASH. Please note that you would need OpenBSD's netcat, the classic one won't work. Of course you could use inet socket instead of unix one.

server.sh:

#!/usr/bin/env bash

SOCKET=server.sock
PIDFILE=server.pid

(
    exec </dev/null
    exec >/dev/null
    exec 2>/dev/null
    coproc SERVER {
        exec nc -l -k -U $SOCKET
    }
    echo $SERVER_PID > $PIDFILE
    {
        while read ; do
            echo "pong $REPLY"
        done
    } <&${SERVER[0]} >&${SERVER[1]}
    rm -f $PIDFILE
    rm -f $SOCKET
) &
disown $!

client.sh:

#!/usr/bin/env bash

SOCKET=server.sock

coproc CLIENT {
    exec nc -U $SOCKET
}

{
    echo "$@"
    read
} <&${CLIENT[0]} >&${CLIENT[1]}

echo $REPLY

Usage:

$ ./server.sh
$ ./client.sh ping
pong ping
$ ./client.sh 12345
pong 12345
$ kill $(cat server.pid)
$
Share:
33,787

Related videos on Youtube

slm
Author by

slm

Worked in the tech field for over 20+ years. Started out learning basic on an Apple IIe then on a TRS-80. Been interested in computer hardware and software my entire life. Consider myself lucky that my hobby as a kid/adult is what I get to do everyday earning a living. You can learn more about me here. ============================================================ Stolen from @Mokubai: First, please put down the chocolate-covered banana and step away from the European currency systems. You may consider how to ask a question.

Updated on September 18, 2022

Comments

  • slm
    slm over 1 year

    Can someone provide a couple of examples on how to use coproc?

  • slm
    slm almost 11 years
    Can you add some of the article to your answer? I was trying to get this topic covered in U&L since it seemed under represented. Thanks for your answer! Also notice I set the tag as Bash, not zsh.
  • Stéphane Chazelas
    Stéphane Chazelas almost 11 years
    They're not special kinds of pipes, they are the same pipes as used with |. (that is use pipes in most shells, and socketpairs in ksh93). pipes and socketpairs are first-in, first-out, they're all FIFO. mkfifo makes named pipes, coprocesses don't use named pipes.
  • cblemuel
    cblemuel almost 11 years
    @slm sorry for zsh... actually i work on zsh. I tend to do it sometimes with the flow. It works fine in Bash too...
  • cblemuel
    cblemuel almost 11 years
    @ Stephane Chazelas I am pretty sure that i read it somewhere that it's I/O is connected with special kinds of pipes called FIFO...
  • slm
    slm almost 11 years
    @MunaiDasUdasin - if you can find the source add it as a reference.
  • jlliagre
    jlliagre almost 11 years
    @MunaiDasUdasin as already stated, all pipes are FIFO by design. Otherwise they wouldn't have been called pipes in the first place.
  • cblemuel
    cblemuel almost 11 years
    @jilliagre as i said "i read it somewhere" and i included the reference for it... and Stephane Chazelas already mentioned that pipes are FIFO by design...thank you for stating it again.... And for future references I got the fact that pipes are FIFO.... i am including the link again ...
  • Thomas Nyman
    Thomas Nyman almost 11 years
    @MunaiDasUdasin In Unix, named pipes are commonly called FIFOs, although all pipes are first-in-first-out by design. Named pipes differ from traditional, unnamed pipes in that named pipes can persist beyond the lifetime of the process that created them, while unnamed pipes persist only for the lifetime of the process. It seems that the zsh implementation of coprocesses involve a named pipe p , but bash coprocesses, which the question is explicitly about, use traditional unnamed pipes.
  • Stéphane Chazelas
    Stéphane Chazelas almost 11 years
    @ThomasNyman, zsh or ksh co-processes don't use named pipes. It's >&p, not >p. The >&p is a special syntax that means, redirect to the co-process (using unnamed pipes). unnamed pipes can persist after the lifetime of the process (in its children, of in a process that has opened /proc/pid/fd/n on Linux for instance)
  • Thomas Nyman
    Thomas Nyman almost 11 years
    @StephaneChazelas You are of course correct, although whereas unnamed pipes are mostly used for parent-child communication, opening them through /proc/pid/fd/ is perhaps not typical usage. A better choice of words would probably have been that named pipes persist as long as the corresponding file system object exists, or the pipe is kept open by the processes accessing it, whereas unnamed pipes have no associated file system object (apart from the file descriptor in /proc/pid/fd/), and persist only as long as they are kept open by processes.
  • mklement0
    mklement0 about 9 years
    Great answer indeed. I don't know when specifically it was fixed, but as of at least bash 4.3.11, you can now close coproc file descriptors directly, without the need for an aux. variable; in terms of the example in your answer exec {tr[1]}<&- would now work (to close the coproc's stdin; note that your code (indirectly) tries to close {tr[1]} using >&-, but {tr[1]} is the coproc's stdin, and must be closed with <&-). The fix must have come somewhere between 4.2.25, which still exhibits the problem, and 4.3.11, which doesn't.
  • Stéphane Chazelas
    Stéphane Chazelas almost 9 years
    @mklement0, thanks. exec {tr[1]}>&- does indeed seem to work with newer versions and is referenced in a CWRU/changelog entry (allow words like {array[ind]} to be valid redirection... 2012-09-01). exec {tr[1]}<&- (or the more correct >&- equivalent though that makes no difference as that just calls close() for both) doesn't close the coproc's stdin, but the writing end of the pipe to that coproc.
  • mklement0
    mklement0 almost 9 years
    Thanks for researching, clarifying, and thanks for updating your answer. Can I suggest that you mention the 4.3 thing not only in the code comment, but also in the text above, such as "a bit cumbersome in Bash versions before 4.3"?
  • Otheus
    Otheus over 7 years
    One advantage over mkfifo is that you don't have to worry about race conditions and security for pipe access. You still have to worry about deadlock with fifo's.
  • shub
    shub about 7 years
    About deadlocks: the stdbuf command can help to prevent at least some of them. I used it under Linux and bash. Anyway I believe @StéphaneChazelas is right in the Conclusion: the "head scratching" phase ended for me only when I switched back to named pipes.
  • mosvy
    mosvy almost 5 years
    As to why they didn't use named pipes in ksh: named pipes were introduced very early on, long before ksh (I have a unix system III running in an emulator, and it does have them), but they were adopted very late in bsd (in 44bsd), so they weren't a portable solution.
  • maxschlepzig
    maxschlepzig almost 5 years
    'But in bash, the only way to pass those to executed commands is to duplicate them to fds 0, 1, or 2. That limits the number of co-processes you can interact with for a single command.' Really? This works for me exec 3<&${p[0]}; cat /proc/$$/fd/3 in bash 4.4.23 (for a coproc named p). Also, I don't understand: 'In bash this cannot be done, because you can't avoid the close-on-exec flag.' - I mean can't you avoid it by duplicating the file descriptor (because the close-on-exec flag isn't duplicated ...)?
  • maxschlepzig
    maxschlepzig almost 5 years
    Ok, regarding the 2nd part: I've straced a bash script and bash directly calls FD_CLOEXEC on the duplicated file descriptor with fcntl() after the dup2() call.
  • Miles Rout
    Miles Rout almost 4 years
    Truly a brilliant answer
  • HappyFace
    HappyFace about 3 years
  • ddekany
    ddekany about 3 years
    There should be a huge warning that tells that if the co-process exists, it will concurrently unset the variables through which you access the file descriptors and PID. So, you must ensure that the consumer controls when the co-process will exit, or else the script will "randomly" fail (depending on timing). At least on bash (4.2.46) it does this.