Command to delete directories whose contents are less than a given size
Solution 1
With GNU find
and GNU coreutils
, and assuming your directories don't have newlines in their names:
find ~/foo -mindepth 1 -maxdepth 1 -type d -exec du -ks {} + | awk '$1 <= 50' | cut -f 2-
This will list directories with total contents smaller than 50K. If you're happy with the results and you want to delete them, add | xargs -d \\n rm -rf
to the end of the command line.
Solution 2
With the GNU implementations of du
, awk
and xargs
, to work with arbitrary file names, you do:
(
cd ~/foo &&
du --block-size=1 -l0d1 |
awk -v RS='\0' -v ORS='\0' '
$1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
xargs -r0 echo rm -rf --)
)
That is:
- specify a block size as otherwise which one GNU
du
uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes). - Use
-0
to work with NUL-delimited records (NUL being the only character that may not be found in a file path). -d1
to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.
) is excluded with!/^[0-9]\t\.$/
inawk
.-l
to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.
Remove the echo
(dry-run) to actually do it.
Or with perl
instead of gawk
:
perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'
POSIXly, you'd need something like:
(
unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
cd ~/foo &&
LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
for dir do
du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
[ "$?" -eq 41 ] && echo rm -rf "$dir"
done' sh {} +
)
(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
and POSIXLY_CORRECT=1
being for GNU du
to make sure it uses 512 as the block size as POSIX requires).
Solution 3
I know this is kind of old but I have my own $0.02 for doing this and hope it might help someone else down the line. Using GNU parallel for much better parallel performance:
find . -type d | parallel du -s {} | sort -h
This will output all directory sizes in PWD sorted by size. To sort reverse:
find . -type d | parallel du -s {} | sort -hr
Note that sort -h
also works with du -h
:
~ VirtualBox VMs $ find . -type d | parallel du -sh {} | sort -h
4.0K ./CentOS6/dir with spaces
4.0K ./TFE79/Snapshots
8.0K ./Desktop_default_1614944927311_69927/Logs
8.0K ./Desktop_default_1614945289369_20675/Logs
12K ./Desktop_default_1614944927311_69927
12K ./Desktop_default_1614945289369_20675
96K ./hello-world/Logs
108K ./hello-world
160K ./Knoppix/Logs
172K ./Desktop_default_1627485664080_37244/Logs
172K ./Knoppix
208K ./CentOS6/Logs
228K ./Flash/Logs
880K ./TFE8/Logs
980K ./TFE79/Logs
260M ./NomadOS
411M ./Desktop_default_1627485664080_37244/Snapshots
4.5G ./CentOS6
6.6G ./Flash
9.4G ./TFE8/Snapshots
13G ./TFE8
15G ./Desktop_default_1627485664080_37244
18G ./TFE79
56G .
Related videos on Youtube
Brian Fitzpatrick
I am a Lecturer in the mathematics department at Duke University.
Updated on September 18, 2022Comments
-
Brian Fitzpatrick almost 2 years
I'm working in a directory
~/foo
which has subdirectories~/foo/alpha ~/foo/beta ~/foo/epsilon ~/foo/gamma
I would like to issue a command that checks the total size under each "level 1" subdirectory of
~/foo
and deletes the directory along with its contents if the size is under a given amount.So, say I'd like to delete the directories whose contents have less than
50K
. Issuing$ du -sh */
returns8.0K alpha/ 114M beta/ 20K epsilon/ 1.2G gamma/
I'd like my command to delete
~/alpha
and~/epsilon
along with their contents. Is there such a command? I suspect this can be done withfind
somehow but I'm not quite sure how. -
lcd047 almost 9 years@BrianFitzpatrick There is also ncdu that can be useful occasionally.
-
tripleee about 7 yearsThis looks extremely complex and brittle. Usually the recommended approach is to handle everything related to file name handling inside the
-exec
; spaces are not the only problematic character, mind you (newlines are another common corner case, though it's less often encountered in reality). -
Stéphane Chazelas over 2 yearsparallelizing tasks that are I/O bound tasks is counter productive. Also, running
du
for each dir means you're going to get disk usages of the same files several times.du -s dir
includes the disk usage reported bydu -s dir/subdir
. Rundu
without-s
instead withoutfind
. You'll need-h
fordu
if you want human suffixes. So here justdu -lh | sort -rh
(all those -l, -h being GNU extensions and here assuming dir paths don't contain newline characters). -
Stéphane Chazelas over 2 yearsYour problem is that you used
xargs
without-d \\n
as per the currently accepted answer (though to be fair, it was added after you posted your answer).-d
is a GNU extension. If yourxargs
doesn't support it but supports-0
(another GNU extension for common these days), you can usedfind... | awk... | tr '\n' '\0' | xargs -0 rm...
-
Ole Tange over 2 years@StéphaneChazelas "Parallelizing tasks that are I/O bound tasks is counter productive." Not always. The answer is really: "it depends, so measure instead of assume". oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster
-
BoeroBoy over 2 yearsIn practice it works much better for me anyway. Once the metadata is cached other threads that might re-use it speed up by a significant margin. 56 threads on this box and it's about 16x faster in most of my experiences. In my case I needed to purge small or empty garbage dirs from a web crawler so left the full min/max depth.