How to get folder size ignoring hard links?
Solution 1
Total size in bytes of all files in hourly.2
which have only one link:
$ find ./hourly.2 -type f -links 1 -printf "%s\n" | awk '{s=s+$1} END {print s}'
From find
man-page:
-links n
File has n links.
To get the sum in kilobytes instead of bytes, use -printf "%k\n"
To list files with different link counts, play around with find -links +1
(more than one link), find -links -5
(less than five links) and so on.
Solution 2
As @Gilles says, since du
counts only the first of all hardlinks pointing to the same inode it encounters, you can give it directories in a row:
$ du -hc --max-depth=0 dirA dirB
29G /hourly.1
1G /hourly.2
30G total
I.e. any file in 'hourly.2' referencing an inode (aka "real" file) already referenced in 'hourly.1', will not be counted.
Solution 3
If you specifically want the size of the files that are present under hourly.2
but not under hourly.1
, you can obtain it a little indirectly with du
. If du
processes the same file more than once (even under different names, i.e. hard links), it only counts the file the first time. So what du hourly.1 hourly.2
reports for hourly.2
is the size you're looking for. Thus:
du -ks hourly.1 hourly.2 | sed -n '2s/[^0-9]*//p'
(Works on any POSIX system and most other Unix variants. Assumes that the directory name hourly.1
doesn't contain any newline.)
Solution 4
More simple
du -hc --max-depth=1 path/
Example
9.4G daily/users/rockspa/home/daily.21
3.6G daily/users/rockspa/home/daily.30
4.2G daily/users/rockspa/home/daily.11
1.1G daily/users/rockspa/home/daily.4
4.2G daily/users/rockspa/home/daily.9
3.0G daily/users/rockspa/home/daily.25
3.5G daily/users/rockspa/home/daily.20
4.2G daily/users/rockspa/home/daily.13
913M daily/users/rockspa/home/daily.5
2.8G daily/users/rockspa/home/daily.26
1.4G daily/users/rockspa/home/daily.1
2.6G daily/users/rockspa/home/daily.28
4.2G daily/users/rockspa/home/daily.15
3.8G daily/users/rockspa/home/daily.19
327M daily/users/rockspa/home/daily.8
4.2G daily/users/rockspa/home/daily.17
3.1G daily/users/rockspa/home/daily.23
...
Solution 5
Awesomely BusyBox's builds of find
comes without -printf
support. Here is modification to @grebneke's answer:
find . -type f -links 1 -exec ls -l {} \;| awk '{s=s+$5} END {print s}'
Benubird
Updated on September 18, 2022Comments
-
Benubird over 1 year
I use rsnapshot for backups, which generates a series of folders containing files of the same name. Some of the files are hard linked, while others are separate. For instance,
hourly.1/file1
andhourly.2/file1
might be hard linked to the same file, whilehourly.1/file2
andhourly.2/file2
are entirely separate files.I want to find the amount of space used by the folder
hourly.2
ignoring any files which are hard links to files inhourly.1
. So in the above example, I would want to get the size of file2, but ignore file1.I'm using bash on linux, and I want to do this from the command line as simply as possible, so no big graphical or other-OS-only solutions please.
-
cuonglm about 10 yearsIf a file some where is hard link to a file in
hourly2
, your command will procedure wrong answer. -
grebneke about 10 years@Gnouc - Well yes - it depends on how the files end up in
hourly.2
. If they are copied there, they will not have extra links and my command will work. If they are hard-linked, obviously it will fail. I'm assuming new backup-files are copied. -
akavel over 7 yearsAccording to du --help, option --max-depth=0 is equivalent to -s, so above can be shortened as:
$ du -hcs dirA dirB
-
Andreas Krey over 5 yearsFor some strange reason du doesn't always notices hardlinked files on RHEL5 - if I do 'du -sh dir/sub dir' the output for dir is the same as if I just said 'du -sh dir' - not excluding the size of 'dir/sub'.
-
TiberiusKirk over 4 yearsThanks Abdel. This should be the accepted answer.
-
dimitarvp over 4 yearsAwesome. This worked for me on the first try on my macOS 10.15. Thank you.