Command to delete directories whose contents are less than a given size

find directory

11,184

Solution 1

With GNU find and GNU coreutils, and assuming your directories don't have newlines in their names:

find ~/foo -mindepth 1 -maxdepth 1 -type d -exec du -ks {} + | awk '$1 <= 50' | cut -f 2-

This will list directories with total contents smaller than 50K. If you're happy with the results and you want to delete them, add | xargs -d \\n rm -rf to the end of the command line.

Solution 2

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

Solution 3

I know this is kind of old but I have my own $0.02 for doing this and hope it might help someone else down the line. Using GNU parallel for much better parallel performance:

find . -type d | parallel du -s {} | sort -h

This will output all directory sizes in PWD sorted by size. To sort reverse:

find . -type d | parallel du -s {} | sort -hr

Note that sort -h also works with du -h:

~  VirtualBox VMs  $  find . -type d | parallel du -sh {} | sort -h
4.0K    ./CentOS6/dir with spaces
4.0K    ./TFE79/Snapshots
8.0K    ./Desktop_default_1614944927311_69927/Logs
8.0K    ./Desktop_default_1614945289369_20675/Logs
12K     ./Desktop_default_1614944927311_69927
12K     ./Desktop_default_1614945289369_20675
96K     ./hello-world/Logs
108K    ./hello-world
160K    ./Knoppix/Logs
172K    ./Desktop_default_1627485664080_37244/Logs
172K    ./Knoppix
208K    ./CentOS6/Logs
228K    ./Flash/Logs
880K    ./TFE8/Logs
980K    ./TFE79/Logs
260M    ./NomadOS
411M    ./Desktop_default_1627485664080_37244/Snapshots
4.5G    ./CentOS6
6.6G    ./Flash
9.4G    ./TFE8/Snapshots
13G     ./TFE8
15G     ./Desktop_default_1627485664080_37244
18G     ./TFE79
56G     .

11,184

Brian Fitzpatrick

I am a Lecturer in the mathematics department at Duke University.

Updated on September 18, 2022

Comments

Brian Fitzpatrick almost 2 years
I'm working in a directory ~/foo which has subdirectories
```
~/foo/alpha
~/foo/beta
~/foo/epsilon
~/foo/gamma
```
I would like to issue a command that checks the total size under each "level 1" subdirectory of ~/foo and deletes the directory along with its contents if the size is under a given amount.

So, say I'd like to delete the directories whose contents have less than 50K. Issuing $ du -sh */ returns
```
8.0K alpha/
114M beta/
20K  epsilon/
1.2G gamma/
```
I'd like my command to delete ~/alpha and ~/epsilon along with their contents. Is there such a command? I suspect this can be done with find somehow but I'm not quite sure how.
lcd047 almost 9 years

@BrianFitzpatrick There is also ncdu that can be useful occasionally.
tripleee about 7 years

This looks extremely complex and brittle. Usually the recommended approach is to handle everything related to file name handling inside the -exec; spaces are not the only problematic character, mind you (newlines are another common corner case, though it's less often encountered in reality).
Stéphane Chazelas over 2 years

parallelizing tasks that are I/O bound tasks is counter productive. Also, running du for each dir means you're going to get disk usages of the same files several times. du -s dir includes the disk usage reported by du -s dir/subdir. Run du without -s instead without find. You'll need -h for du if you want human suffixes. So here just du -lh | sort -rh (all those -l, -h being GNU extensions and here assuming dir paths don't contain newline characters).
Stéphane Chazelas over 2 years

Your problem is that you used xargs without -d \\n as per the currently accepted answer (though to be fair, it was added after you posted your answer). -d is a GNU extension. If your xargs doesn't support it but supports -0 (another GNU extension for common these days), you can used find... | awk... | tr '\n' '\0' | xargs -0 rm...
Ole Tange over 2 years

@StéphaneChazelas "Parallelizing tasks that are I/O bound tasks is counter productive." Not always. The answer is really: "it depends, so measure instead of assume". oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-fas‌ter
BoeroBoy over 2 years

In practice it works much better for me anyway. Once the metadata is cached other threads that might re-use it speed up by a significant margin. 56 threads on this box and it's about 16x faster in most of my experiences. In my case I needed to purge small or empty garbage dirs from a web crawler so left the full min/max depth.