How can I get the size of stdin?
Solution 1
tl;dr: tar -cv dir | wc -c - | cut -d' ' -f 1 | awk '{print $1/1000"K"}'
du
doesn't actually count the size of the file itself. It just asks the kernel to query the filesystem, which already keeps track of file size. This is why it's so fast. Because of that, and the fact that you're counting a stream, not a file, du
doesn't work. My guess is that 1.0K
is a hardcoded size for /dev/std*
in the kernel.
The solution is to use wc -c
, which counts bytes itself instead of querying the kernel:
$ tar -cv dir | wc -c
If you want output similar to du -h
:
$ tar -cv dir | wc -c | awk '{print $1/1000"K"}'
The awk
turns the number into a human-readable result.
Solution 2
With GNU tar
you can just do:
tar --totals -c . >/dev/null
...which will render output like...
Total bytes written: 5990400 (5.8MiB, 5.5GiB/s)
...on stderr. Similarly, with any tar (or stream) you can use dd
to deliver a report on byte counts. This may or may not be preferable to wc
, but dd
defaults to a block-size of 512 bytes - which is identical to tar
's block-size. If your system's PIPE_BUF is large enough, you can even expand dd
's block-size to match tar
's record size - which is 20 blocks, or 10240 bytes. Like this:
tar -c . | dd bs=bx20 >/dev/null
585+0 records in
585+0 records out
5990400 bytes (6.0 MB) copied, 0.0085661 s, 699 MB/s
This may or may not offer a more performant solution than wc
.
In both the dd
and tar
use-cases you needn't actually dispose of the stream, though. I redirect to /dev/null
above - but I could have as easily redirected to some file and still received the report on its size at the time it was written.
Solution 3
I'd suggest:
tar cf - dir | wc -c
A simple c
(no leading -
is required) is used to create a tar
archive, f
specifies an output file and -
denotes that it be stdout. (Note that if you want just the size and there are many files beneath dir you may rather omit tar
's v
for performance reasons.)
Solution 4
I would go for @strugee answer but I'll use the numfmt tool which is designed for memory unit conversion and supports many options:
tar -cv dir | wc -c | numfmt --to=si
--to=si
will display the output in SI units (e.g: 4G)
Solution 5
The wording of your question lends itself to the tar ... | wc -c
answers above. I originally read your question with a silent assumption that you wanted the size to be reported while it was creating the tar file (perhaps tar's output was then being piped over a network link?).
In which case, I'd suggest pv
-- pipe viewer. I've seen reference to it but have not yet had a chance to play with it.
References
Related videos on Youtube
user2914606
Updated on September 18, 2022Comments
-
user2914606 almost 2 years
I'm about to compress a large directory and I want to know how large, exactly, the resulting file will be.
I've tried using
du
:$ tar -cv dir | du -h - du: cannot access '-': No such file or directory
Then I tried using the file version of '-':
$ tar -cv dir | du -h /dev/stdin 1.0K
I'm certain this number isn't accurate. How can I get the size of stdin?
-
Janis about 9 yearsNote that if you just omit
wc
's superfluous-
then you don't need the subsequentcut
command either. -
Janis about 9 yearsI've looked it up in a book that I used as a manual at that time, and I think was based on SysV R4. Few folks certainly recall what
/etc/mt0
actually means - "magnetic tape" ;-) I'd be interested how Solaris'tar
behaves (because Solaris is one of the contemporary OSes that are known to have still really old stuff in/bin
). -
Janis about 9 years@mikeserv; PS: The book mentions AT&T's "UNIX Programmers Manual Volumes 1, 2A, 2B" as source (but no manual date or UNIX release version, though; but must have been from the early 1980's, 1983, or so).
-
mikeserv about 9 years@Janis - possibly true in the simplest case - but imagine rather that
dd
's output is passed on - to a compressor, say - and for whatever reason you find it desirable to know both the raw size of the archive and the compressed one. Also useful is to get an instant report on the record counts -tar
is not just an archive, but a stream format. It can be used in ways other than just saving to a group of files to some other file. It is often useful for blocking a stream before modifying it. At each of those record boundaries is a whole block of NULs. -
mikeserv about 9 yearsHave you ever seen this? Unrelated - but I just found it today, and thought you might like it.
-
user2914606 about 9 yearsI don't quite understand how this answer is different from mine. is it the presence of the
-f
flag totar
? -
Janis about 9 years@strugee; It was a bit more different before you edited yours just a few minutes ago. The current differences are, 1) unnecessary use of
v
, unnecessary use of-
(in front ofc
, which is of course just a detail), possible problem with some versions oftar
that expect-f -
if stdin is concerned. -
user2914606 about 9 yearsOK, that makes sense.
-
Cody Allan Taylor almost 7 years1.0K is the block size of stdin.