Why use a named pipe instead of a file?

85,186

Solution 1

Almost everything in Linux can be considered a file, but the main difference between a regular file and a named pipe is that a named pipe is a special instance of a file that has no contents on the filesystem.

Here is quote from man fifo:

A FIFO special file (a named pipe) is similar to a pipe, except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing. When processes are exchanging data via the FIFO, the kernel passes all data internally without writing it to the filesystem. Thus, the FIFO special file has no contents on the filesystem; the filesystem entry merely serves as a reference point so that processes can access the pipe using a name in the filesystem.

The kernel maintains exactly one pipe object for each FIFO special file that is opened by at least one process. The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.

So actually a named pipe does nothing until some process reads and writes to it. It does not take any space on the hard disk (except a little bit of meta information), it does not use the CPU.

You can check it by doing this:

Create a named pipe

$ mkfifo /tmp/testpipe

Go to some directory, for example /home/user/Documents, and gzip everything inside it, using named pipe.

$ cd /home/user/Documents
$ tar cvf - . | gzip > /tmp/testpipe &
[1] 28584

Here you should see the PID of the gzip process. In our example it was 28584.

Now check what this PID is doing

$ ps u -P 28584
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
c0rp     28584  0.0  0.0  29276  7800 pts/8    S    00:08   0:00 bash

You will see that it is using no resources. 0% CPU usage, 0% memory usage.

Verify hunch regarding file space usage

$ du -h /tmp/testpipe
0   testpipe

And again 0, nothing. The testpipe could be used again if needed.

Don't forget to kill gzip, using kill -15 28584. And remove our named pipe using rm /tmp/testpipe

Example Usages

You can redirect almost everything using named pipe. As example you can see this one line proxy.

Also here is one more nice explanation of named pipe usage. You can configure two processes on one server to communicate using a named pipe instead of TCP/IP stack. It is much faster, and does not load network resources. For example your Web Server can communicate with the database directly using a named pipe, instead of using localhost address or listening to some port.

Solution 2

It is true that you won't use system memory but the fact you don't use cpu in your example is only because you don't read the pipe so the process is waiting.

Consider following example:

mkfifo /tmp/testpipe
tar cvf - / | gzip > /tmp/testpipe

Now open a new console and run:

watch -n 1 'ps u -P $(pidof tar)

And in a third console:

cat /tmp/testpipe > /dev/null

If you look at the watch cmd (2nd term) it will show an increase in cpu consumption !

Solution 3

Here is a use case where named pipes can save you a lot of time by removing I/O.

Let's suppose you have a BigFile, for example 10G.

You also have splits of this BigFile in pieces of 1G, BigFileSplit_01 to BigFile_Split_10.

Now you have a doubt on the correctness of BigFileSplit_05

Naively, without named pipes, you would create a new split from BigFile and compare:

dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05

With named pipes you would do

mkfifo BigFileSplitOrig_05
dd if=BigFile of=BigFileSplitOrig_05 bs=1G skip=4 count=1 &
diff -s BigFileSplitOrig_05 BigFileSplit_05
rm BigFileSplitOrig_05

That may not seem at first sight a big difference... but in time the difference is huge!

Option 1:

  • dd: read 1G / write 1G (1)
  • diff: read 2G
  • rm: free allocated clusters / remove directory entry

Option 2:

  • dd: nothing! (goes to named pipe)
  • diff: read 2G
  • rm: no allocated cluster to manage (we didn't actually write anything to the filesystem) / remove directory entry

So basically the named pipe saves you here a read and write of 1G plus some filesystem cleaning (since we wrote nothing to the filesystem but the empty fifo node).

Not doing I/O, especially writes, is also good to avoid the wear of your disks. It is even more interesting when you work with SSDs since they have a limited number of writes before cells die.

(1) Obviously, another option would be to create that temporary file to RAM, for example if /tmp is mounted to RAM (tmpfs). Nevertheless you would be limited by the size of the RAM disk, whereas the "named pipe trick" has no limits.

Solution 4

You can let a program lie still and listen to a named pipe for some outside event. As soon as the outside event occurs (f.ex. arrival of some new data) this could be detected by some other program which in turn opens the pipe for write, writing the relevant event data to the pipe. When the close statement is issued, the listening program will receive the stream of data through the pipe via a read statement, and is ready to process what it has got. Don't forget tor close the pipe after reading the content. The listening program could also return results of its processing via the same, or via another named pipe. Such inter-program communications is very convenient at times.

Share:
85,186
bsky
Author by

bsky

Updated on September 18, 2022

Comments

  • bsky
    bsky almost 2 years

    I recently read about named pipes, and I couldn't understand why they exist.
    I've read somewhere that using a named pipe is less time-consuming than using a file.

    Why is this so?
    The named pipes also have to be stored in memory (and maybe swapped, just like files).
    As far as I can see, they must get an inode which must be referenced by the current directory, just like files. Also, they must be removed by the programmer, just like files.

    So where does the advantage lie?

    • don.joey
      don.joey about 10 years
      This is not part of a classroom assignment, is it?
    • bsky
      bsky about 10 years
      no ... actually I was looking over some lecture notes when I found this question and I couldn't answer it ... and if it were an assignment, I don't see how that would be relevant ... it's not like I wouldn't search for the answer until I would find it
  • wjandrea
    wjandrea about 7 years
    This answer is about c0rp's answer
  • onlycparra
    onlycparra almost 4 years
    If I understand your words ("You will see that it is using no resources. 0% CPU usage, 0% memory usage.") correctly, that doesn't make sense. of course there must be cpu and memory usage. The fact that you were too slow to check it, does not mean that gzip ran magically in no-cpu.
  • onlycparra
    onlycparra almost 4 years
    If the pipe has no limit, then it will necessarily write to disk as it runs out of ram
  • Zakhar
    Zakhar almost 4 years
    Not at all! That is precisely why it saves I/O. The standard pipe buffer is 1Mb (see /proc/sys/fs/pipe-max-size), so the first dd command will write that amount to the pipe, which is purely in memory, and block untill another process reads from the pipe and frees some space for more data. The "another process" is the diff command that will "consume" data from the pipe. So no, not at all, and that is precisely the purpose of pipes, writing to a fifo will NOT write to disk... unless you are swapping to a point the pipe itself needs to be swapped!
  • onlycparra
    onlycparra almost 4 years
    That's a great explanation, thanks. I was trying to make a different point, though: Let process A fill a pipe, and B consume it. If B consume data from the pipe at a lower rate than A fills, A will get stuck at some point. If we increase the pipe's size, it physically cannot stay in RAM and be bigger than RAM. Therefore, there is a limit of data that can be STORED in the pipe: RAM size. That was my point. Now, I get that in any scenario there is no limit to the amount of data that can PASS THROUGH the pipe. Ideally, with B consuming at the same (or higher) speed than A produces. :)
  • Zakhar
    Zakhar almost 4 years
    Without privilege the pipe's RAM is by default limited to /proc/sys/fs/pipe-max-size which is commonly set to 1MB. And yes, the solution you are thinking about is what I suggest in the footnote: option 1 (above) + write the first dd to RAM. It works provided you have enough RAM, and have set up for example /tmp as RAM. So yes, in this case you'll have the same amount of disk I/O but still will be slower because the first dd and the diff are not run in parallel. Pipes are good for that too! :-)
  • linkhyrule5
    linkhyrule5 almost 4 years
    Well, no. Gzip didn't run magically in no-cpu, it didn't run at all -- it's blocked waiting on read.
  • onlycparra
    onlycparra about 3 years
    @DerekMahar, Thanks for the great point. At the end of your comment, did you mean "...larger than the size of A's output data"?
  • Admin
    Admin about 2 years
    I think more precise will be "dd: read 1G; diff: read 1G". Anyway this is a good and descriptive example.