Why is /dev/null a file? Why isn't its function implemented as a simple program?

24,379

Solution 1

In addition to the performance benefits of using a character-special device, the primary benefit is modularity. /dev/null may be used in almost any context where a file is expected, not just in shell pipelines. Consider programs that accept files as command-line parameters.

# We don't care about log output.
$ frobify --log-file=/dev/null

# We are not interested in the compiled binary, just seeing if there are errors.
$ gcc foo.c -o /dev/null  || echo "foo.c does not compile!".

# Easy way to force an empty list of exceptions.
$ start_firewall --exception_list=/dev/null

These are all cases where using a program as a source or sink would be extremely cumbersome. Even in the shell pipeline case, stdout and stderr may be redirected to files independently, something that is difficult to do with executables as sinks:

# Suppress errors, but print output.
$ grep foo * 2>/dev/null

Solution 2

In fairness, it's not a regular file per se; it's a character special device:

$ file /dev/null
/dev/null: character special (3/2)

It functioning as a device rather than as a file or program means that it's a simpler operation to redirect input to or output from it, as it can be attached to any file descriptor, including standard input/output/error.

Solution 3

I suspect the why has a lot to do with the vision/design that shaped Unix (and consequently Linux), and the advantages stemming from it.

No doubt there's a non-negligible performance benefit to not spinning up an extra process, but I think there's more to it: Early Unix had an "everything is a file" metaphor, which has a non-obvious but elegant advantage if you look at it from a system perspective, rather than a shell scripting perspective.

Say you have your null command-line program, and /dev/null the device node. From a shell-scripting perspective, the foo | null program is actually genuinely useful and convenient, and foo >/dev/null takes a tiny bit longer to type and can seem weird.

But here's two exercises:

  1. Let's implement the program null using existing Unix tools and /dev/null - easy: cat >/dev/null. Done.

  2. Can you implement /dev/null in terms of null?

You're absolutely right that the C code to just discard input is trivial, so it might not yet be obvious why it's useful to have a virtual file available for the task.

Consider: almost every programming language already needs to work with files, file descriptors, and file paths, because they were part of Unix's "everything is a file" paradigm from the beginning.

If all you have are programs that write to stdout, well, the program doesn't care if you redirect them into a virtual file that swallows all writes, or a pipe into a program that swallows all writes.

Now if you have programs that take file paths for either reading or writing data (which most programs do) - and you want to add "blank input" or "discard this output" functionality to those programs - well, with /dev/null that comes for free.

Notice that the elegance of it is that is reduces the code complexity of all involved programs - for each common-but-special usecase that your system can provide as a "file" with an actual "filename", your code can avoid adding custom command-line options and custom code paths to handle.

Good software engineering often depends on finding good or "natural" metaphors for abstracting some element of a problem in a way that becomes easier to think about but remains flexible, so that you can solve basically the same range of higher-level problems without having to spend the time and mental energy on reimplementing solutions to the same lower-level problems constantly.

"Everything is a file" seems to be one such metaphor for accessing resources: You call open of a given path in a heirarchical namespace, getting a reference (file descriptor) to the object, and you can read and write, etc on the file descriptors. Your stdin/stdout/stderr are also file descriptors that just happened to be pre-opened for you. Your pipes are just files and file descriptors, and file redirection lets you glue all these pieces together.

Unix succeeded as much as it did in part because of how well these abstractions worked together, and /dev/null is best understood as part of that whole.


P.S. It's worth looking at the Unix version of "everything is a file" and things like /dev/null as the first steps towards a more flexible and powerful generalization of the metaphor that has been implemented in many systems that followed.

For example, in Unix special file-like objects like /dev/null had to be implemented in the kernel itself, but it turns out that it's useful enough to expose functionality in file/folder form that since then multiple systems have been made that provide a way for programs to do that.

One of the first was the Plan 9 operating system, made by some of the same people who made Unix. Later, GNU Hurd did something similar with its "translators". Meanwhile, Linux ended up getting FUSE (which has spread to the other mainstream systems by now as well).

Solution 4

I think /dev/null is a character device (that behaves like an ordinary file) instead of a programm for performance reasons.

If it would be a program it would require loading, starting, scheduling, running, and afterwards stopping and unloading the program. The simple C program you are describing would of course not consume a lot of ressources, but I think it makes a significant difference when considering a large number (say millions) of redirect / piping actions as process management operations are costly on a large scale as they involve context switches.

Another assumption: Piping into an program requires memory to be allocated by the receiving program (even if it is discarded directly afterwards). So if you pipe into the tool you have the double memory consumption, once on the sending program and again on the receiving program.

Solution 5

Aside from "everything is a file" and hence ease of usage everywhere that most other answers are based on, there is also performance issue as @user5626466 mentions.

To show in practice, we'll create simple program called nullread.c:

#include <unistd.h>
char buf[1024*1024];
int main() {
        while (read(0, buf, sizeof(buf)) > 0);
}

and compile it with gcc -O2 -Wall -W nullread.c -o nullread

(Note: we cannot use lseek(2) on pipes, so only way to drain the pipe is to read from it until it is empty).

% time dd if=/dev/zero bs=1M count=5000 |  ./nullread
5242880000 bytes (5,2 GB, 4,9 GiB) copied, 9,33127 s, 562 MB/s
dd if=/dev/zero bs=1M count=5000  0,06s user 5,66s system 61% cpu 9,340 total
./nullread  0,02s user 3,90s system 41% cpu 9,337 total

whereas with standard /dev/null file redirection we get much better speeds (due to facts mentioned: less context switching, kernel just ignoring data instead of copying it etc):

% time dd if=/dev/zero bs=1M count=5000 > /dev/null
5242880000 bytes (5,2 GB, 4,9 GiB) copied, 1,08947 s, 4,8 GB/s
dd if=/dev/zero bs=1M count=5000 > /dev/null  0,01s user 1,08s system 99% cpu 1,094 total

(this should be a comment there, but is too big for that and would be completely unreadable)

Share:
24,379
Ankur S
Author by

Ankur S

Undergrad, too much coffee

Updated on September 18, 2022

Comments

  • Ankur S
    Ankur S over 1 year

    I am trying to understanding the concept of special files on Linux. However, having a special file in /dev seems plain silly when its function could be implemented by a handful of lines in C to my knowledge.

    Moreover you could use it in pretty much the same manner, i.e. piping into null instead of redirecting into /dev/null. Is there a specific reason for having it as a file? Doesn't making it a file cause many other problems like too many programs accessing the same file?

    • Charles Duffy
      Charles Duffy about 6 years
      Incidentally, much of this overhead is also why cat foo | bar is much worse (at scale) than bar <foo. cat is a trivial program, but even a trivial program creates costs (some of them specific to FIFO semantics -- because programs can't seek() inside FIFOs, for example, a program that could be implemented efficiently with seeking can end up doing much more expensive operations when given a pipeline; with a character device like /dev/null it can fake those operations, or with a real file it can implement them, but a FIFO doesn't allow any kind of contextually-aware handling).
    • rexkogitans
      rexkogitans about 6 years
      grep blablubb file.txt 2>/dev/null && dosomething could not work with null being a program or a function.
    • mtraceur
      mtraceur about 6 years
      You might find it enlightening (or at least mind-expanding) to read about the Plan 9 operating system to see where the "everything is a file" vision was going - it becomes a little easier to see the power of having resources available as file paths once you see a system fully embracing the concept (rather than mostly/partially, like modern Linux/Unix do).
    • JdeBP
      JdeBP about 6 years
      As well as no-one pointing out that a device driver running in kernel space is a program with "a handful of lines of C", none of the answers so far have actually addressed the supposition of "too many programs accessing the same file" in the question.
    • glglgl
      glglgl about 6 years
      @JdeBP Probably that's because nobody knows what "too many programs accessing the same file" is supposed to mean.
    • Peter - Reinstate Monica
      Peter - Reinstate Monica about 6 years
      Re "its function could be implemented by a handful of lines in C": You wouldn't believe it, but it is implemented by a handful of lines in C! For example, the body of the read function for /dev/null consists of a "return 0" (meaning it doesn't do anything and, I suppose, results in an EOF): (From static github.com/torvalds/linux/blob/master/drivers/char/mem.c) ssize_t read_null(struct file *file, char __user *buf, size_t count, loff_t *ppos) { return 0; } (Oh, I just see that @JdeBP made that point already. Anyway, here is the illustration :-).
    • JdeBP
      JdeBP about 6 years
      I doubt that very much. It's fairly easy to understand what "too many programs accessing the same file" means. Addressing the question that the supposition forms part of seems fairly simple.
    • chepner
      chepner about 6 years
      One of the things Unix is famous for is reducing several seemingly unrelated interfaces to a single file interface. A file is just a source of or destination for bytes; the OS doesn't care what the interpretation of a particular read or write is. This simplified the implementation of Unix greatly compared to previous operating systems, which had to provide distinct interfaces to things like files, printers, terminals, and yes, bit buckets.
    • PlasmaHH
      PlasmaHH about 6 years
      I am not sure how you propose that simple program should work that you want to use instead of /dev/null in the places where programs expect a file to write to.
    • Matija Nalis
      Matija Nalis about 6 years
      @JdeBP it might be easy for you, but not for everyone. Foe example,the only real problems with many programs accessing the same file I can see is file consistency, and that does not apply at all for /dev/null (as it does not store any data, but simply disregards it). Also, increased resource usage of extra one file descriptor is infinitesimally small compared to resource usage even of small program using it, so that can't be problem either. So, please explain what you think is a problem with many programs accessing /dev/null?
    • Izkata
      Izkata about 6 years
      @glglgl My guess is they have something in mind akin to the Windows "file is already in use" error.
    • Admin
      Admin about 6 years
      Because, in UNIX, everything is a file :-)
    • j_kubik
      j_kubik about 6 years
      Also, don't forget that a special null constant interferes with the file namespace. Sometimes when a Windows user gives me eg. a USB dongle for some data I will add a file called nul on it as well - deleting that is not an easy task: stackoverflow.com/questions/17883481/…
  • Ankur S
    Ankur S about 6 years
    Thanks for the answer, it was certainly informative. Could you elaborate a bit more on why a program couldn't be used in its place in redirection?
  • Pankaj Goyal
    Pankaj Goyal about 6 years
    A program could be used in its place for indirection, but redirection (e. g. cat file > /dev/null) would overwrite the executable with the contents of file rather than redirecting the output.
  • filbranden
    filbranden about 6 years
    cat file | null would have a lot of overhead, first in setting up a pipe, spawning a process, executing "null" in the new process, etc. Also, null itself would use quite a bit of CPU in a loop reading bytes into a buffer that is later just discarded... The implementation of /dev/null in the kernel is just more efficient that way. Also, what if you want to pass /dev/null as an argument, instead of a redirection? (You could use <(...) in bash, but that's even way heavier handed!)
  • Mark Plotnick
    Mark Plotnick about 6 years
    If you had to pipe to a program named null Instead of using redirection to /dev/null, would there be a simple, clear way to tell the shell to run a program while sending just its stderr to null?
  • user1024
    user1024 about 6 years
    @AnkurS: If you want to do that, you can write cat file | true.
  • jamesqf
    jamesqf about 6 years
    @ Ankur S: Re "I had cat file | null more in mind", so you're a perfect typist, and never accidentally type '>' when you meant '|'?
  • Peter Cordes
    Peter Cordes about 6 years
    Linux's FUSE (filesystem in user-space) makes it possible for a program to let other programs access virtual files/directories, e.g. to make a .zip looks like a filesystem, or whatever. But mount points and virtual filesystems are not the normal Unix mechanism, so this isn't widely used for things that don't really behave like filesystems for reading or storing file data.
  • Peter Cordes
    Peter Cordes about 6 years
    @MarkPlotnick: Presumably shell grammar would have evolved syntax like foo 2>(bar) or foo 2|(bar). (In case you were wondering, in current bash 4.4, echo 2>(cat) prints 2/dev/fd/63, because bash does process substitution, but echo doesn't treat its args as filenames. (And concatenating a leading 2 doesn't help).)
  • Peter Cordes
    Peter Cordes about 6 years
    It's not just the setup cost, it's that every write into a pipe requires memory copying, and a context switch to reading program. (Or at least a context switch when the pipe buffer is full. And the reader has to do another copy when it reads the data). This is not negligible on a single-core PDP-11 where Unix was designed! Memory bandwidth / copying is much cheaper today than it was then. A write system call to an FD open on /dev/null can return right away without even reading any data from the buffer.
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    Those examples are wrong. dd of=- writes to a file called -, just omit the of= to write to stdout as that's where dd writes by default. Piping to false wouldn't work as false doesn't read its stdin, so dd would be killed with a SIGPIPE. For a command that discards its input you can use... cat > /dev/null. Also the comparison who probably be irrelevant as the bottle neck would probably be the random number generation here.
  • psmears
    psmears about 6 years
    @StéphaneChazelas: To be fair, using cat > /dev/null isn't totally accurate, because as well as reading its input, it will write it out, so it will make roughly twice as many syscalls as a null process that just read and discarded data. Your other points are valid (and important) though!
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @psmears, well it does discard its input even though not in the most efficient way. You'll probably find sed d or awk 0 or perl -ne '' which discard without writng to /dev/null are actually still less efficient as they have extra overhead of their own.
  • psmears
    psmears about 6 years
    @StéphaneChazelas: True - that's why I didn't suggest them as an alternative :)
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @psmears, any any way, at least on Linux, the most effective way to write such a null command would probably to use splice() onto /dev/null to at least avoid having to copy the content of the pipe back into user space as would happened if you used read() to flush the content of the pipe.
  • OrangeDog
    OrangeDog about 6 years
    @PeterCordes the point of the answer is starting from a position of not understanding the design. If everyone already understood the design, this question would not exist.
  • Mark Plotnick
    Mark Plotnick about 6 years
    The AST versions of dd etc. don't even bother doing a write syscall when they detect the destination is /dev/null.
  • psmears
    psmears about 6 years
    @StéphaneChazelas: Yes, indeed :) Unfortunately GNU cat doesn't use splice() (I know it's Linux-specific, but other GNU tools do take advantage of Linux-specific features e.g. tail uses inotify). Nor does it use the optimisation that MarkPlotnick mentions...
  • mtraceur
    mtraceur about 6 years
    @PeterCordes Last I checked, a process had to be root (or at least have CAP_SYS_ADMIN) to set up a FUSE mount. Is this still the case? In Plan 9 it was a regular operation subject to regular permissions, rather than a strictly root one. This is what I meant by "at least some kernel intervention", though I recognize that this is bad wording - I've edited it to be more explicit.
  • Peter Cordes
    Peter Cordes about 6 years
    @OrangeDog: That was my point: it's a good and sensible design, not something that needs to be apologized for. :P Unlike with the cat foo | bar vs. <foo bar debate, or other Unix features that may seem a bit crusty these days, this one's pretty clear-cut.
  • Peter Cordes
    Peter Cordes about 6 years
    @mtraceur: Mount an image file without root permission? shows some evidence that FUSE might not require root, but I'm not sure.
  • mtraceur
    mtraceur about 6 years
    @PeterCordes RE: "seems weird": It's not an apology for the design, just an acknowledgement of how it can seem if you're not thinking of the system implementation under it, and haven't yet had the eureka moment about the system-wide design advantages. I tried to make that clear by opening that sentence with "from a shell scripting perspective", and alluding to the contrast between that vs. a system perspective a couple sentences prior. On further thought "can seem weird" is better, so I'll tweak it to that. I welcome further wording suggestions to make it clearer without making it too verbose.
  • mtraceur
    mtraceur about 6 years
    @PeterCordes Thanks! And thanks for the link about FUSE mounting! As I understand it, at an underlying kernel/system level, mounting requires root, but in practice there are tools like fusermount that help initiating FUSE mounts from non-privileged contexts. You've actually helped me realize that I shouldn't be focusing on how FUSE and Plan 9 implementations differ. I've edited the ending again to reflect the greater point: that the design itself is good, with the fact that multiple systems have reimplemented generalizations of it being illustrative examples.
  • figtrap
    figtrap about 6 years
    Best answer, but to nitpick, "everything is a file" isn't a metaphor, everything in unix actually is a file. Now the word "file" of course is figurative.
  • StephenG - Help Ukraine
    StephenG - Help Ukraine about 6 years
    The very first thing I was told as a young engineer in relation to Unix was "Everything Is A File" and I swear you could hear the capitals. And getting hold of that idea early makes Unix/Linux seem a lot more easy to understand. Linux inherited most of that design philosophy. I'm glad someone mentioned it.
  • figtrap
    figtrap about 6 years
    Just want to add that implementing "null" as a char. device invites a lot of creativity; there's lots of possibilities there. As an executable it would be more limited.
  • aschepler
    aschepler about 6 years
    I'm not sure I understand the part about separate redirections to sink executables being difficult. In C, you just do a pipe, fork, and execve like any other process piping, just with changes to the dup2 calls that set up the connections, right? It's true most shells don't offer the prettiest ways to do that, but presumably if we didn't have so much device-as-file pattern and most of the things in /dev and /proc were treated as executables, shells would have been designed with ways to do it as easily as we redirect now.
  • Cristian Ciupitu
    Cristian Ciupitu about 6 years
    @PeterCordes, DOS "solved" the typing problem by making the magic filename NUL appear in every directory, i.e. all you have to type is > NUL.
  • Cubic
    Cubic about 6 years
    @aschepler It's that not redirecting to sink executables is difficult. It's that writing applications that can write to/read from both files and the null sink would be more complicated if the null sink wasn't a file. Unless you're talking about a world where instead of everything being a file, everything is an executable? That'd be a very different model than what you have in *nix OS.
  • mtraceur
    mtraceur about 6 years
    @figtrap Thanks! I've been thinking about the "metaphor" thing a bit - I agree it's maybe not the best word - can you think of a better one to use here? So far "abstraction" feels the most "right", but I'm happy to hear other suggestions.
  • aschepler
    aschepler about 6 years
    @Cubic Yes, that's clear. I was more asking about the last paragraph in this post. "stdout and stderr may be redirected to files independently" is contrasted with something hypothetical, but it's not entirely clear exactly what. I jumped at first to "... redirected to sinks independently", but maybe that's not the intent? If it's meant to be about the generality between regular files and special files, it doesn't help that the following example doesn't involve any regular file.
  • ioctl
    ioctl about 6 years
    @aschepler You forgot wait4! You are correct, it is certainly possible to pipe stdout and stderr to different programs using POSIX apis, and it may be possible to invent a clever shell syntax for redirecting stdout and stderr to different commands. However, I'm not aware of any such shell right now, and the larger point is that /dev/null fits neatly into existing tooling (which largely works with files), and /bin/null wouldn't. We could also imagine some IO API that makes it as easy for gcc to (securely!) output to a program as to a file, but that's not the situation we are in.
  • kkm
    kkm about 6 years
    @PeterCordes, my note is tangential, but it's possible that, paradoxically, memory writes today are more expensive than ever. An 8-core CPU potentially performs 16 integer operations in a clock time, while an end-to-end memory write would complete in e. g. 16 clocks (4GHz CPU, 250 MHz RAM). That's the factor of 256. RAM to the modern CPU is like an RL02 to the PDP-11 CPU, almost like a peripheral storage unit! :) Not as straightforward, naturally, but everything hitting the cache will get written out, and useless writes would deprive other computations of the ever important cache space.
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: Yes, wasting about 2x 128kiB of L3 cache footprint on a pipe buffer in the kernel and a read buffer in the null program would suck, but most multi-core CPUs don't run with all cores busy all the time, so the CPU time to run the null program is mostly free. On a system with all cores pegged, useless piping is a bigger deal. But no, a "hot" buffer can be rewritten many times without getting flushed to RAM, so we're mostly just competing for L3 bandwidth, not cache. Not great, especially on a SMT (hyperthreading) system where other logical core(s) on the same physical are competing...
  • Peter Cordes
    Peter Cordes about 6 years
    .... But your memory calculation is very flawed. Modern CPUs have lots of memory parallelism, so even though latency to DRAM is something like 200-400 core clock cycles and L3>40, bandwidth is ~8 bytes / clock. (Surprisingly, single-threaded bandwidth to L3 or DRAM is worse on a many-core Xeon with quad-channel memory vs. a quad-core desktop, because it's limited by the max concurrency of requests one core can keep in flight. bandwidth = max_concurrency / latency: Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?)
  • Peter Cordes
    Peter Cordes about 6 years
    ... See also 7-cpu.com/cpu/Haswell.html for Haswell numbers comparing quad-core vs. 18-core. Anyway, yes modern CPUs can get a ridiculous amount of work done per clock, if they aren't stuck waiting for memory. Your numbers appear to only be 2 ALU operations per clock, like maybe a Pentium from 1993, or a modern low-end dual-issue ARM. A Ryzen or Haswell potentially performs 4 scalar integer ALU ops + 2 memory ops per core per clock, or far more with SIMD. e.g. Skylake-AVX512 has (per core) 2-per-clock throughput on vpaddd zmm: 16 32-bit elements per instruction.
  • Peter Cordes
    Peter Cordes about 6 years
    See also What Every Programmer Should Know About Memory?. (My 2017 update on Ulrich Drepper's very excellent original article from late Pentium 4 / early Core 2 days, when even a single-threaded workload could bottleneck on DRAM bandwidth, but now it takes multiple threads to max out the memory controllers.) Anyway, yes, cache space is very valuable, and depriving other cores of some of it does suck.
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: Not trying to jump down your throat with this flood of comments; more like CPU performance is an interesting topic for me so I can't help but comment with more details. And see my profile pic :P
  • Matija Nalis
    Matija Nalis about 6 years
    @ioctl regarding shells; both zsh and bash at least will allow you to do things like grep localhost /dev/ /etc/hosts 2> >(sed 's/^/STDERR:/' > errfile ) > >(sed 's/^/STDOUT:/' > outfile), resulting is separately processed errfile and outfile
  • user1686
    user1686 about 6 years
    ...and of course the >(...) syntax just expands to a special filename (you're actually running 2> /proc/x/fd/y), which neatly loops back to what ioctl wrote in the answer.
  • kkm
    kkm about 6 years
    @PeterCordes: Why, thank you, I appreciate the info and links! I maybe dated on my CPU part (anyone pulling RL02 from their memory would by now :) ). The main flaw with my argument is, I believe, as you pointed, a buffer may be written over many times without being flushed to disk. And you are right, I should have better compared byte to byte in CPU/RAM throughput. If I may ask, how should I bring DRAM latencies into the picture? The figure of 250MHz (40ns) may be too conservative, but it is unlikely possible to achieve 2666 MHz either, that's only the burst speed, is it?
  • kkm
    kkm about 6 years
    @PeterCordes: Flushed to RAM immediately, not disk, what was I thinking! Probably that RL02 thing. Brain fart, sorry.
  • figtrap
    figtrap about 6 years
    @mtraceur actually I don't think anyone was concerned with such things, at least as part of the UNIX design, "file" is a term rather strictly defined in UNIX ... metaphors came later :)
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: Right, DDR4-2666 is 2666 mega-transfers per second during a burst. Command overhead reduces throughput a little. (It's not 2666MHz, because the actual clock is only half that, and data is transferred on the rising and falling edge. That's what DDR = double-data-rate means.) DRAM latency is mostly column-access latency, not the time to actual transfer data once the right column and row are selected and data is transferring. crucial.com/usa/en/memory-performance-speed-latency, and "What Every Programmer Should Know About Memory?" has a detailed DRAM section.
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: But don't forget the latency for a request to even make it from a core to a memory controller, which is like 36 cycles on a quad-core Haswell or 62 cycles on an 18-core Haswell 7-cpu.com/cpu/Haswell.html. My answer on Ram real time latency says the same thing. Anyway, DRAM latency is determined by its clock, and how tight the CAS latency timings are, and cache-miss latency is latency inside the CPU + DRAM latency. But a single core can keep ~10 requests in flight (Intel CPUs have 10 Line Fill Buffers, and 16 superqueue L2<->L3)
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: But really for this, DRAM latency isn't in the picture, normally just L3 bandwidth for the pipe buffer (not even latency if we don't hit max parallelism). DRAM latency comes in if the extra cache footprint causes extra misses in other tasks. But it's very hard to predict what impact exactly that will have, and how much of that latency out-of-order execution can hide. Related: users.elis.ugent.be/~leeckhou/papers/ispass06-eyerman.pdf examines the cost of branch-prediction misses compared with other stalls like cache misses that have to go all the way to DRAM, vs. I-cache miss.
  • kkm
    kkm about 6 years
    @PeterCordes: Wow, thank you so much, I am speechless! I still do not have a complete picture, and would certainly like to ask you a couple more q's, if you do not mind. Maybe you can suggest which SE would be the best for it? Would it be too focused for SO--or do you feel it would be the right place? There is not a SE on computer hardware, at the least at this level of detail, AFAIK. But your answers would be super useful to other people too, I am certain, if they'd get more exposure.
  • Peter Cordes
    Peter Cordes about 6 years
    @kkm: go ahead and ask on SO, tag it with [performance] and maybe [cpu-architecture] and [cpu-cache] and/or [memory], and/or [x86] if it's about x86 hardware. Include a link to this comment thread for background on what you're asking.
  • kkm
    kkm about 6 years
    @PeterCordes: Absolutely, thank you so much! I'll write a summary of your explanations and include your links, and then describe what I understand and what I do not.
  • Peter Cordes
    Peter Cordes about 6 years
    What hardware did you test on? 4.8GB/s is pretty low compared to the 23GB/s I get on a Skylake i7-6700k (DDR4-2666, but the buffer should stay hot in L3 cache. So a good portion of the cost is system calls being expensive with Spectre + Meltdown mitigation enabled. That hurts doubly for piping, because pipe buffers are smaller than 1M, so that's more write / read system calls. Nearly 10x perf difference is worse than I expected, though. On my Skylake system it's 23GB/s vs. 3.3GB/s, running x86-64 Linux 4.15.8-1-ARCH, so that's a factor of 6.8. Wow, system calls are expensive now)
  • Matija Nalis
    Matija Nalis about 6 years
    @PeterCordes It's old low end laptop (Acer Aspire E17), with 4x Intel(R) Celeron(R) CPU N2940 @ 1.83GHz running x86_64 Linux 4.9.82-1+deb9u3, so the lower score is quite expected
  • user2948306
    user2948306 about 6 years
    @PeterCordes 3GB/s with 64k pipe buffers suggests 2x 103124 syscalls per second... and that number of context switches, heh. On a server cpu, with 200000 syscalls per second, you might expect ~8% overhead from PTI, since there is very little working set. (The graph I'm referencing doesn't include the PCID optimization, but maybe that's not so significant for small working sets). So I'm not sure PTI has a big impact there? brendangregg.com/blog/2018-02-09/…
  • Peter Cordes
    Peter Cordes about 6 years
    Oh interesting, so it's a Silvermont with 2MB of L2 cache, so your dd buffer + receive buffer don't fit; you're probably dealing with memory bandwidth instead of last-level cache bandwidth. You might get better bandwidth with 512k buffers or even 64k buffers. (According to strace on my desktop, write and read are returning 1048576, so I think that means we're only paying the user<->kernel cost of TLB invalidation + branch-prediction flush once per MiB, not per 64k, @sourcejedi. It's Spectre mitigation that has the most cost, I think)
  • user2948306
    user2948306 about 6 years
    Interesting! All the search results/blogs seems to be for the meltdown/pti news, I guess they're popular & also included the word "spectre". Would be very convenient if you can give me a pointer for the impact of the subsequent Spectre mitigation on syscalls.
  • Lyle
    Lyle about 6 years
    @CristianCiupitu I believe that in DOS, NUL is not a magic filename, it is a device. More properly, it's written "NUL:", just like C:, D:, LPT1:, etc. IOW, it exists globally, not "in every directory", and is in fact very much like /dev/null.
  • Phil Frost
    Phil Frost about 6 years
    @Lyle Yeah? Then why does echo print /dev/fd/63?
  • Lyle
    Lyle about 6 years
    Hm. Good point. Well, this is implemented by shells, so perhaps your shell is different from the old Bourne shell I grew up with.
  • Lyle
    Lyle about 6 years
    One difference is that echo doesn't read from stdin, while grep does, but I can't think how the shell would know that before it execs them.
  • Lyle
    Lyle about 6 years
    And strace does make this clearer: for me. you have it exactly right, with bash. The '<(...)' construct is doing something quite different from <filename. Hm. I learned something.
  • Peter Cordes
    Peter Cordes about 6 years
    @sourcejedi: With Spectre mitigation enabled, the cost of a syscall that returns right away with ENOSYS is ~1800 cycles on Skylake with Spectre mitigation enabled, most of it being the wrmsr that invalidates the BPU, according to @BeeOnRope's testing. With mitigation disabled, the user->kernel->user round trip time is ~160 cycles. But if you are touching lots of memory, Meltdown mitigation is significant, too. Hugepages should help (fewer TLB entries need to be reloaded).
  • Peter Cordes
    Peter Cordes about 6 years
    This sounds like nonsense. Writing a bug-free kernel driver is no easier than writing a bug-free program that reads+discards its stdin. It doesn't need to be setuid or anything, so for both /dev/null or a proposed input-discarding program, the attack vector would be the same: get a script or program that runs as root to do something weird (like try to lseek in /dev/null or open it multiple times from the same process , or IDK what. Or invoke /bin/null with a weird environment, or whatever).
  • user2948306
    user2948306 about 6 years
    @PeterCordes Thanks! Circling back. In the pipe case, we see only 1 extra syscall per 1MB in the reader. In every case we used a writer with 2 syscalls per 1MB. On my system, reducing the block size below 64k does start to crater throughput, but that's not what we're doing.
  • Matija Nalis
    Matija Nalis about 6 years
    FYI, if I modify bs=16k count=320000 and reduce buffer in nullread.c to 16k, /dev/null worsens to 3.3GB/s, but nullread.c improves to 1.2GB/s. At 64k both, it is 4.7Gb/s for /dev/null and 1.3GB/s for nullread.c. At 4k, /dev/null drops to 1.5GB/s, and nullread.c to 683 MB/s. In any case, /dev/null retains much superior perfomance.
  • IMSoP
    IMSoP about 6 years
    @Lyle NUL, CON, etc act as files in exactly the same way (and for the same reason) as /dev/null and friends on Unix-likes. They also act as though they exist in every directory, and are apparently reserved with every extension as well; try yourself with a command like echo hello > C:\Temp\NUL.txt
  • IMSoP
    IMSoP about 6 years
    @CristianCiupitu Actually, as the article you link explains, the existence in every directory is just because early versions of MS-DOS had a flat filesystem, so writing > NUL was ubiquitous. Once directories were added, that meant "NUL in the current directory", so if only the root \NUL was special, you'd end up with dozens of files called NUL from running old programs with some other current directory. Unix was, I believe, always tree-based, so didn't need to worry about this.
  • millimoose
    millimoose about 6 years
    Also, even once you have your shell able to pipe into programs for you, or if you think spawning a subprocess isn't particularly complicated, you're still communicating with the process using a file handle anyway; so all you've added is a bunch of different steps to getting a writeable stream. I don't think it really matters whether those different steps are done in the shell or in the driver for the null device, you're not going to use it any different than a file anyway
  • Ankur S
    Ankur S about 6 years
    Maybe I should have phrased it more precisely . I meant why implement it as a char device instead of a program. It would be a few lines either way but the program implementation would be decidedly simpler. As the other answers have pointed out, there are quite a few benefits to this; efficiency and portability chief among them.
  • Matthieu Moy
    Matthieu Moy about 6 years
    Sure. I just added this answer because seeing the actual implementation was fun (I discovered it recently myself), but the real reason is what others pointed out indeed.
  • Ankur S
    Ankur S about 6 years
    Me too! I recently started learning devices in linux and the answers were quite informative