How to understand pipes

shell pipe system-calls architecture

6,898

Solution 1

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

Solution 2

Two of the basic fundamentals of UNIX philosophy are

To make small programs that do one thing well.
and expect the output of every program to become the input to another,as
yet unknown,program.

The use of pipes let you leverage the effects of these two design
fundamentals to create extremely powerful chains of commands to achieve your desired result.

Most command-line programs that operate on files can also accept input on standard in(input through keyboard) and output to standard out(print on
screen).

Some commands are designed to only operate within a pipe can't operate on files directly.

for example tr command

  ls -C | tr 'a-z' 'A-Z'

    cmd1 | cmd2

Sends STDOUT of cmd1 to STDIN of cmd2 instead of the screen.
STDERR is not forwarded across pipes.

In short Pipes is character (|) can connect commands.

Any command that writes to STDOUT can be be used on the left hand side of pipe.
```
   ls - /etc | less 
```
Any command that reads from STDIN can be used on the right-hand side of a pipe.
```
   echo "test print" | lpr 
```
A traditional pipe is "unnamed" because it exists anonymously and persists only for as long as the process is running. A named pipe is system-persistent and exists beyond the life of the process and must be deleted once it is no longer being used. Processes generally attach to the named pipe (usually appearing as a file) to perform inter-process communication (IPC).

source : http://en.wikipedia.org/wiki/Named_pipe

Solution 3

To supplement the other answers...

stdin and stdout are file descriptors and are read and written as if they are files. therefore you can do echo hi | grep hi, and it will replace echo's stdout with a pipe and replace stdin of grep to other end of this pipe.

Solution 4

Everything is a file.

If we take the phrase too literally, we would end up with a meaning of “we only have files, and nothing else”. This is not the correct interpretation, so what is.

When we say “Everything is a file”, we are not saying that everything is stored on a disk. We are saying that everything looks like a file, can be read, can be written.

In Unix, once a file, or non-file is open, then it can be treated like a file. However not all files support all operations. E.g. some files (that are not files), do not support seek: they must be read/written in sequence (this is true of pipes and sockets).

Everything has a filename (on some systems: e.g. Debian Gnu/Linux, and many other Gnu/Linux).

All open files get a filename. See /proc/self/fd/…
Network sockets can be opened with a filename see /dev/tcp
e.g. cat </dev/tcp/towel.blinkenlights.nl/23

View more solutions

6,898

Author by

Tim

Elitists are oppressive, anti-intellectual, ultra-conservative, and cancerous to the society, environment, and humanity. Please help make Stack Exchange a better place. Expose elite supremacy, elitist brutality, and moderation injustice to https://stackoverflow.com/contact (complicit community managers), in comments, to meta, outside Stack Exchange, and by legal actions. Push back and don't let them normalize their behaviors. Changes always happen from the bottom up. Thank you very much! Just a curious self learner. Almost always upvote replies. Thanks for enlightenment! Meanwhile, Corruption and abuses have been rampantly coming from elitists. Supportive comments have been removed and attacks are kept to control the direction of discourse. Outright vicious comments have been removed only to conceal atrocities. Systematic discrimination has been made into policies. Countless users have been harassed, persecuted, and suffocated. Q&A sites are for everyone to learn and grow, not for elitists to indulge abusive oppression, and cover up for each other. https://softwareengineering.stackexchange.com/posts/419086/revisions https://math.meta.stackexchange.com/q/32539/ (https://i.stack.imgur.com/4knYh.png) and https://math.meta.stackexchange.com/q/32548/ (https://i.stack.imgur.com/9gaZ2.png) https://meta.stackexchange.com/posts/353417/timeline (The moderators defended continuous harassment comments showing no reading and understanding of my post) https://cs.stackexchange.com/posts/125651/timeline (a PLT academic had trouble with the books I am reading and disparaged my self learning posts, and a moderator with long abusive history added more insults.) https://stackoverflow.com/posts/61679659/revisions (homework libels) Much more that have happened.

Updated on September 18, 2022

Comments

Tim almost 2 years

When I just used pipe in bash, I didn't think more about this. But when I read some C code example using system call pipe() together with fork(), I wonder how to understand pipes, including both anonymous pipes and named pipes.

It is often heard that "everything in Linux/Unix is a file". I wonder if a pipe is actually a file so that one part it connects writes to the pipe file, and the other part reads from the pipe file? If yes, where is the pipe file for an anonymous pipe created? In /tmp, /dev, or ...?

However, from examples of named pipes, I also learned that using pipes has space and time performance advantage over explicitly using temporary files, probably because there are no files involved in implementation of pipes. Also pipes seem not store data as files do. So I doubt a pipe is actually a file.
Tim almost 13 years

Thanks! Are the two commands connected by a pipe running in parallel, instead of the second starts to run after the first finishes?
jfg956 almost 13 years

Yes, the 2 commands are run in parallel. If they were not and the 1st output more than the buffer, it would be blocked. You can try it by running cmd1 > fifo and cmd2 < fifo in 2 different shells, creating the named pipe with mkfifo fifo.
jfg956 almost 13 years

Another test you can do, is to kill cmd2 while cmd1is still running: cmd1 will probably stop reporting a broken pipe mesage.
Tim almost 13 years

Thanks! what do you mean would be blocked? If this happens, does it mean the date in the stream after block will be lost?
jfg956 almost 13 years

Data is not lost. If the pipe buffer is full, cmd1's write to the pipe will only return when cmd2 will have read data from the pipe. In the same way, cmd2's read from a pipe will block if the buffer is empty until cmd1 writes to the pipe.
CMCDragonkai about 9 years

Is there a way to change the fixed size buffer of pipes? Where can I find out what the fixed sized buffer is? Also on another note, it seems that fuser can't find the processes trying to write to a named pipe? I just tried it. Is that because the file descriptor does not actually exist on the named pipe file?
tripleee over 7 years

Maybe also include rm tmpfile after cmd1 >tempfile; cmd2 <tempfile to make it more obvious that the file is indeed temporary and does not persist beyond the lifetime of the pipeline.
Kusalananda almost 6 years

That last part is only valid on systems with a /proc filesystem, and on systems (or shells) that provide a /dev/tcp file structure.