How can I get the size of stdin?

9,345

Solution 1

tl;dr: tar -cv dir | wc -c - | cut -d' ' -f 1 | awk '{print $1/1000"K"}'

du doesn't actually count the size of the file itself. It just asks the kernel to query the filesystem, which already keeps track of file size. This is why it's so fast. Because of that, and the fact that you're counting a stream, not a file, du doesn't work. My guess is that 1.0K is a hardcoded size for /dev/std* in the kernel.

The solution is to use wc -c, which counts bytes itself instead of querying the kernel:

$ tar -cv dir | wc -c

If you want output similar to du -h:

$ tar -cv dir | wc -c | awk '{print $1/1000"K"}'

The awk turns the number into a human-readable result.

Solution 2

With GNU tar you can just do:

tar --totals -c . >/dev/null

...which will render output like...

Total bytes written: 5990400 (5.8MiB, 5.5GiB/s)

...on stderr. Similarly, with any tar (or stream) you can use dd to deliver a report on byte counts. This may or may not be preferable to wc, but dd defaults to a block-size of 512 bytes - which is identical to tar's block-size. If your system's PIPE_BUF is large enough, you can even expand dd's block-size to match tar's record size - which is 20 blocks, or 10240 bytes. Like this:

tar -c . | dd bs=bx20 >/dev/null
585+0 records in
585+0 records out
5990400 bytes (6.0 MB) copied, 0.0085661 s, 699 MB/s

This may or may not offer a more performant solution than wc.

In both the dd and tar use-cases you needn't actually dispose of the stream, though. I redirect to /dev/null above - but I could have as easily redirected to some file and still received the report on its size at the time it was written.

Solution 3

I'd suggest:

tar cf - dir | wc -c

A simple c (no leading - is required) is used to create a tar archive, f specifies an output file and - denotes that it be stdout. (Note that if you want just the size and there are many files beneath dir you may rather omit tar's v for performance reasons.)

Solution 4

I would go for @strugee answer but I'll use the numfmt tool which is designed for memory unit conversion and supports many options:

tar -cv dir | wc -c | numfmt --to=si

--to=si will display the output in SI units (e.g: 4G)

Solution 5

The wording of your question lends itself to the tar ... | wc -c answers above. I originally read your question with a silent assumption that you wanted the size to be reported while it was creating the tar file (perhaps tar's output was then being piped over a network link?).

In which case, I'd suggest pv -- pipe viewer. I've seen reference to it but have not yet had a chance to play with it.

References

Share:
9,345

Related videos on Youtube

user2914606
Author by

user2914606

Updated on September 18, 2022

Comments

  • user2914606
    user2914606 almost 2 years

    I'm about to compress a large directory and I want to know how large, exactly, the resulting file will be.

    I've tried using du:

    $ tar -cv dir | du -h -
    du: cannot access '-': No such file or directory
    

    Then I tried using the file version of '-':

    $ tar -cv dir | du -h /dev/stdin
    1.0K
    

    I'm certain this number isn't accurate. How can I get the size of stdin?

  • Janis
    Janis about 9 years
    Note that if you just omit wc's superfluous - then you don't need the subsequent cut command either.
  • Janis
    Janis about 9 years
    I've looked it up in a book that I used as a manual at that time, and I think was based on SysV R4. Few folks certainly recall what /etc/mt0 actually means - "magnetic tape" ;-) I'd be interested how Solaris' tar behaves (because Solaris is one of the contemporary OSes that are known to have still really old stuff in /bin).
  • Janis
    Janis about 9 years
    @mikeserv; PS: The book mentions AT&T's "UNIX Programmers Manual Volumes 1, 2A, 2B" as source (but no manual date or UNIX release version, though; but must have been from the early 1980's, 1983, or so).
  • mikeserv
    mikeserv about 9 years
    @Janis - possibly true in the simplest case - but imagine rather that dd's output is passed on - to a compressor, say - and for whatever reason you find it desirable to know both the raw size of the archive and the compressed one. Also useful is to get an instant report on the record counts - tar is not just an archive, but a stream format. It can be used in ways other than just saving to a group of files to some other file. It is often useful for blocking a stream before modifying it. At each of those record boundaries is a whole block of NULs.
  • mikeserv
    mikeserv about 9 years
    Have you ever seen this? Unrelated - but I just found it today, and thought you might like it.
  • user2914606
    user2914606 about 9 years
    I don't quite understand how this answer is different from mine. is it the presence of the -f flag to tar?
  • Janis
    Janis about 9 years
    @strugee; It was a bit more different before you edited yours just a few minutes ago. The current differences are, 1) unnecessary use of v, unnecessary use of - (in front of c, which is of course just a detail), possible problem with some versions of tar that expect -f - if stdin is concerned.
  • user2914606
    user2914606 about 9 years
    OK, that makes sense.
  • Cody Allan Taylor
    Cody Allan Taylor almost 7 years
    1.0K is the block size of stdin.