How to get the actual directory size (out of du)?

linux unix gnu du inode

35,036

Solution 1

Here is a script displaying a human readable directory size using Unix standard tools (POSIX).

#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
BEGIN {sum=0} # initialization for clarity and safety
function pp() {
  u="+Ki+Mi+Gi+Ti+Pi+Ei";
  split(u,unit,"+");
  v=sum;
  for(i=1;i<7;i++) {
    if(v<1024) break;
    v/=1024;
  }
  printf("%.3f %sB\n", v, unit[i]);
}
{sum+=$5}
END{pp()}'

eg:

$ ds ~        
72.891 GiB

Solution 2

Some versions of du support the argument --apparent-size to show apparent size instead of disk usage. So your command would be:

du -hs --apparent-size

From the man pages for du included with Ubuntu 12.04 LTS:

--apparent-size
      print apparent sizes,  rather  than  disk  usage;  although  the
      apparent  size is usually smaller, it may be larger due to holes
      in (`sparse') files, internal  fragmentation,  indirect  blocks,
      and the like

Solution 3

Assuming you have du from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:

find . -type f -print0 | du -scb --files0-from=- | tail -n 1

Add the -l option to du if there are some hardlinked files inside, and you want to count each hardlink separately (by default du counts multiple hardlinks only once).

The most important difference with plain du -sb is that recursive du also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find command is used to pass only regular files to du. Another difference is that symlinks are ignored (if they should be counted, the find command should be adjusted).

This command will also consume more memory than plain du -sb, because using the --files0-from=FILE makes du store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)

If you want to get a human-readable representation of the total size, just add the -h option (this works because du is invoked only once and calculates the total size itself, unlike some other suggested answers):

find . -type f -print0 | du -scbh --files0-from=- | tail -n 1

or (if you are worried that some effects of -b are then overridden by -h)

find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1

Solution 4

Just an alternative, using ls:

ls -nR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'

ls -nR: -n like -l, but list numeric UIDs and GIDs and -R list subdirectories recursively.

grep -v: Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .). '^ d' will exclude the directories.

Ls command: http://linux.about.com/od/commands/l/blcmdl1_ls.htm

Man Grep: http://linux.die.net/man/1/grep

EDIT:

Edited as the suggestion @ Sergey Vlasov.

Solution 5

If all you want is the size of the files, excluding the space the directories take up, you could do something like

find . -type f -print0 | xargs -0 du -scb | tail -n 1

@SergeyVlasov pointed out that this will fail if you have more files than argmax. To avoid that you could use something like:

find . -type f -exec du -sb '{}' \; | gawk '{k+=$1}END{print k}'

View more solutions

35,036

basic6

Hobby programmer (mostly C++)

Updated on September 18, 2022

Comments

basic6 over 1 year

How do I get the actual directory size, using UNIX/Linux standard tools?

Alternative question: How do I get du to show me the actual directory size (not disk usage)?

Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.

I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).

Calculating the directory size using du seems to be unreliable.
For example, mkdir foo && du -b foo reports "4096 foo", 4096 bytes instead of 0 bytes. With very large directories, the directory size reported by du -hs can be off by 100 GB (!) and more (compressed file system).

So what (tool/option) has to be used to get the actual directory size?
- Sergey Vlasov almost 11 years
  
  What filesystem is used in the new location — is it xfs by any chance?
- Sergey Vlasov almost 11 years
  
  The same question was asked here before: Is there a way to force du to report a directory size (recursively) including only sizes of files?
- Sergey Vlasov almost 11 years
  
  And if your new FS is really XFS, the greatly increased disk usage is probably due to aggressive preallocation, which decreases file fragmentation at the cost of disk usage.
Sergey Vlasov almost 11 years

This command will silently give a wrong result if the directory contains so many files that they don't fit in the limit on execve() arguments size — in this case xargs will invoke du multiple times, and each invocation will print grand total just for its part of the complete file list, then tail will show just the total size of the last part.
terdon almost 11 years

@SergeyVlasov good point, I hadn't thought of that, thanks, answer updated.
Sergey Vlasov almost 11 years

Using the -n option for ls instead of -l (show UID/GID numbers instead of names) is safer, because user and group names can contain spaces (e.g., if winbind or sssd is used to join the system to a Windows domain, you can get group names like domain users). It should also be faster due to not needing to lookup user and group names.
Sergey Vlasov almost 11 years

Not sure what to do for FreeBSD — although -b could probably be replaced by -A -B 1, there is no equivalent for --files0-from=-, and using xargs will need some workarounds in case the file list is bigger than ARG_MAX (and some external solution for human-readable output).
Sergey Vlasov almost 11 years

And now I found another option which is missing in all suggested ls invocations here: -q. Without this option the script will break if some file name contains newline characters. Writing really reliable shell scripts is too hard…
jlliagre almost 11 years

@SergeyVlasov The script I posted shouldn't break with such files, only merely ignoring the extra lines. The only problem case would occur should a carefully crafted file had an extra line witha fifth colon that contains a numerical value. Your suggestion would indeed avoid that situation. Thanks for the tip, script updated.
ehime almost 10 years

Excelent answer. +1 to you sir
Karl Forner almost 10 years

does not work: report some space for empty dirs
connorbode over 9 years

this worked for me.
basic6 almost 9 years

This is one of the most reliable solutions. It works with file names that have spaces or quotes in them and it prints a human-readable size.
Pixus.ru over 7 years

It is gives significantly different sizes when you are comparing directories on different file systems. For example same folder has apparent size of 290Gb on zfs file system and 324Gb of exFat. The solutions above give same size.
jlliagre almost 7 years

@KIAaze Thanks for reviewing and fixing my code!
KIAaze almost 7 years

You're welcome. :) But now that you added PiB, the for loop should be i<7. Or do you have a different awk version than me where split returns a zero-indexed array?
gpothier almost 6 years

Thanks, this is MUCH faster than find -exec ls!
jlliagre over 5 years

@KIAaze Now the loop uses i<7 and supports Exbibytes ! :-)