How to get the actual directory size (out of du)?
Solution 1
Here is a script displaying a human readable directory size using Unix standard tools (POSIX).
#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
BEGIN {sum=0} # initialization for clarity and safety
function pp() {
u="+Ki+Mi+Gi+Ti+Pi+Ei";
split(u,unit,"+");
v=sum;
for(i=1;i<7;i++) {
if(v<1024) break;
v/=1024;
}
printf("%.3f %sB\n", v, unit[i]);
}
{sum+=$5}
END{pp()}'
eg:
$ ds ~
72.891 GiB
Solution 2
Some versions of du
support the argument --apparent-size
to show apparent size instead of disk usage. So your command would be:
du -hs --apparent-size
From the man pages for du included with Ubuntu 12.04 LTS:
--apparent-size
print apparent sizes, rather than disk usage; although the
apparent size is usually smaller, it may be larger due to holes
in (`sparse') files, internal fragmentation, indirect blocks,
and the like
Solution 3
Assuming you have du
from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:
find . -type f -print0 | du -scb --files0-from=- | tail -n 1
Add the -l
option to du
if there are some hardlinked files inside, and you want to count each hardlink separately (by default du
counts multiple hardlinks only once).
The most important difference with plain du -sb
is that recursive du
also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find
command is used to pass only regular files to du
. Another difference is that symlinks are ignored (if they should be counted, the find
command should be adjusted).
This command will also consume more memory than plain du -sb
, because using the --files0-from=FILE
makes du
store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l
option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)
If you want to get a human-readable representation of the total size, just add the -h
option (this works because du
is invoked only once and calculates the total size itself, unlike some other suggested answers):
find . -type f -print0 | du -scbh --files0-from=- | tail -n 1
or (if you are worried that some effects of -b
are then overridden by -h
)
find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1
Solution 4
Just an alternative, using ls
:
ls -nR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'
ls -nR
: -n
like -l
, but list numeric UIDs and GIDs and -R
list subdirectories recursively.
grep -v:
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .). '^ d'
will exclude the directories.
Ls command: http://linux.about.com/od/commands/l/blcmdl1_ls.htm
Man Grep: http://linux.die.net/man/1/grep
EDIT:
Edited as the suggestion @ Sergey Vlasov.
Solution 5
If all you want is the size of the files, excluding the space the directories take up, you could do something like
find . -type f -print0 | xargs -0 du -scb | tail -n 1
@SergeyVlasov pointed out that this will fail if you have more files than argmax
. To avoid that you could use something like:
find . -type f -exec du -sb '{}' \; | gawk '{k+=$1}END{print k}'
Related videos on Youtube
Comments
-
basic6 over 1 year
How do I get the actual directory size, using UNIX/Linux standard tools?
Alternative question: How do I get du to show me the actual directory size (not disk usage)?
Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.
I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).
Calculating the directory size using du seems to be unreliable.
For example,mkdir foo && du -b foo
reports "4096 foo", 4096 bytes instead of 0 bytes. With very large directories, the directory size reported bydu -hs
can be off by 100 GB (!) and more (compressed file system).So what (tool/option) has to be used to get the actual directory size?
-
Sergey Vlasov almost 11 yearsWhat filesystem is used in the new location — is it
xfs
by any chance? -
Sergey Vlasov almost 11 yearsThe same question was asked here before: Is there a way to force du to report a directory size (recursively) including only sizes of files?
-
Sergey Vlasov almost 11 yearsAnd if your new FS is really XFS, the greatly increased disk usage is probably due to aggressive preallocation, which decreases file fragmentation at the cost of disk usage.
-
-
Sergey Vlasov almost 11 yearsThis command will silently give a wrong result if the directory contains so many files that they don't fit in the limit on execve() arguments size — in this case
xargs
will invokedu
multiple times, and each invocation will print grand total just for its part of the complete file list, thentail
will show just the total size of the last part. -
terdon almost 11 years@SergeyVlasov good point, I hadn't thought of that, thanks, answer updated.
-
Sergey Vlasov almost 11 yearsUsing the
-n
option forls
instead of-l
(show UID/GID numbers instead of names) is safer, because user and group names can contain spaces (e.g., ifwinbind
orsssd
is used to join the system to a Windows domain, you can get group names likedomain users
). It should also be faster due to not needing to lookup user and group names. -
Sergey Vlasov almost 11 yearsNot sure what to do for FreeBSD — although
-b
could probably be replaced by-A -B 1
, there is no equivalent for--files0-from=-
, and usingxargs
will need some workarounds in case the file list is bigger thanARG_MAX
(and some external solution for human-readable output). -
Sergey Vlasov almost 11 yearsAnd now I found another option which is missing in all suggested
ls
invocations here:-q
. Without this option the script will break if some file name contains newline characters. Writing really reliable shell scripts is too hard… -
jlliagre almost 11 years@SergeyVlasov The script I posted shouldn't break with such files, only merely ignoring the extra lines. The only problem case would occur should a carefully crafted file had an extra line witha fifth colon that contains a numerical value. Your suggestion would indeed avoid that situation. Thanks for the tip, script updated.
-
ehime almost 10 yearsExcelent answer. +1 to you sir
-
Karl Forner almost 10 yearsdoes not work: report some space for empty dirs
-
connorbode over 9 yearsthis worked for me.
-
basic6 almost 9 yearsThis is one of the most reliable solutions. It works with file names that have spaces or quotes in them and it prints a human-readable size.
-
Pixus.ru over 7 yearsIt is gives significantly different sizes when you are comparing directories on different file systems. For example same folder has apparent size of 290Gb on zfs file system and 324Gb of exFat. The solutions above give same size.
-
jlliagre almost 7 years@KIAaze Thanks for reviewing and fixing my code!
-
KIAaze almost 7 yearsYou're welcome. :) But now that you added PiB, the for loop should be i<7. Or do you have a different awk version than me where split returns a zero-indexed array?
-
gpothier almost 6 yearsThanks, this is MUCH faster than find -exec ls!
-
jlliagre over 5 years@KIAaze Now the loop uses
i<7
and supports Exbibytes ! :-)