Keeping the number of files in the folder constant by deleting old files

5,136

Solution 1

I ran the command:

find /mnt/md0/capture/DCN/ -maxdepth 1 -type f -name "*.pcap" -print0 |
  xargs -0 ls -lt | tail -n "$del" | awk '{print $8}'

The problem that I observed was that awk '{print $8}' prints the time, not the file name. awk '{print $9}' would solve that.

Another problems is that xargs may run ls -lt several times which would give you a number of sorted lists of files one after the other, but the whole list would not be sorted.

But, there appear to be other simplifications one could make. You can get the oldest files with:

ls -dt /mnt/md0/capture/DCN/*.pcap | tail -n "$del"

This assumes, as your post seemed to, that the file names have no spaces, tabs, or newline characters in them.

So, the full command for deleting the oldest $del files could be:

ls -dt /mnt/md0/capture/DCN/*.pcap | tail -n "$del" | xargs rm

MORE: If your file names may contain spaces, tabs, backslashes, or quotes in them (but not newlines), use (assuming GNU ls 4.0 (1998) or newer):

ls -dt --quoting-style=shell-always /mnt/md0/capture/DCN/*.pcap |
  tail -n "$del" | xargs rm

Solution 2

For those not wanting to make assumptions on the names of the files:

With zsh:

#! /bin/zsh -
keep=5000
rm -f /mnt/md0/capture/DCN/*.pcap(D.om[$((keep+1)),-1])

That's using zsh globbing qualifiers:

  • D: includes hidden files (Dot files).
  • .: only regular files (like find's -type f)
  • om: reverse order on the age (based on modification time)
  • [$((keep+1)),-1]: only include the 5001st to the last.

(it may fail if the list of files to remove is very big, in which case you may want to use zargs to split it, or enable zsh's builtin rm with zmodload zsh/files).

With relatively recent versions of GNU tools:

cd /mnt/md0/capture/DCN/ &&
  find . -maxdepth 1 -name '*.pcap' -type f -printf '%T@@%p\0' |
    sort -zrn | sed -z "s/[^@]*@//;1,$keep d" | xargs -r0 rm -f

(assuming GNU sed 4.2.2 or above (2012) for -z, GNU sort 1.14 or above (1996) for -z)

find builds a NUL delimited list of filenames with a Unix timestamp prepended (like 1390682991.0859627500@./file) which is sorted by sort. sed removes the timestamp and prints only from the 5001st record. That's passed as arguments to rm using xargs -r0.

or (with any version of GNU tools):

cd /mnt/md0/capture/DCN/ &&
  find . -maxdepth 1 -name '*.pcap' -type f -printf '%T@@%p\0' |
    tr '\0\n' '\n\0' | sort -rn | tail -n "+$(($keep+1))" |
    cut -d @ -f2- | tr  '\0\n' '\n\0' | xargs -r0 rm -f

Same, except that we're using cut to remove the timestamp and tail to select the lines starting from 5001. Because GNU cut and tail don't support a -z to work on NUL delimited records, we use tr to swap the newline and NUL characters before and after feeding the data to them.

With GNU ls (4.0 (1998) or above), and bash:

shopt -s dotglob
cd /mnt/md0/capture/DCN/ &&
  eval "files=($(ls -dt --quoting-style=shell-always -- *.pcap))" &&
  rm -f -- "${files[@]:$keep}"

(that also may fail if the list of file is big. Also note that it may include non-regular pcap files (no -type f)).

Standardly/POSIXly/portably, that's a lot trickier:

cd /mnt/md0/capture/DCN/ &&
  ls -dt ./.pcap ./.*.pcap ./*.pcap | awk -v keep="$keep" '
    function process() {
      if (++n > keep) {
        gsub(/[ \t\n"\\'\'']/,"\\\\&", file)
        print file
        file = ""
      }
    }
    /\// {
      if (NR > 1) process()
      file=$0
      next
    }
    {file = file "\n" $0}
    END {if (NR > 0) process()}' | xargs rm -f

(again, you may reach the limit of the number of arguments, and it does not check for regular files).

The tricky bit there is to handle the filenames with newline characters. Above, we're passing ./* to ls which means / will be included once for each file name, and we use that in awk to identify on which line each filename starts, then we know which newline character (in addition to all the other ones special to xargs) to escape for xargs.

Solution 3

Assuming none of the file names contain space, tab, newline, single quote, double quote or backslash characters, this deletes the oldest files above limit

mkdir t && cd t

# 50500 files, 500 to delete
touch {000001..050500}

limit=50000

ls -t|tail -n "+$(($limit + 1))"|xargs rm 

ls|wc -l
50000

tail -n +50001 shows files above limit.

Share:
5,136

Related videos on Youtube

eric moon
Author by

eric moon

Systems Engineer

Updated on September 18, 2022

Comments

  • eric moon
    eric moon over 1 year

    I'm trying to create a script and run it in crontab every 5 min, so that the number of files in a folder always remain 50000. If there are more, I want the script to delete the old files.

    #!/bin/bash
    LIMIT=500000
    NO=0
    #Get the number of files, that has `*.pcap` in its name, with last modified time 5 days     ago
    
    NUMBER=$(find /mnt/md0/capture/DCN/ -maxdepth 1 -name "*.pcap" |wc -l)
    if [[ $NUMBER -gt $LIMIT ]]  #if number greater than limit
     then
      del=$(($NUMBER-$LIMIT))
       if [ "$del" -lt "$NO" ]
        then
         del=$(($del*-1))
       fi
       echo $del
       FILES=$(
         find /mnt/md0/capture/DCN/ -maxdepth 1 -type f -name "*.pcap" -print0 |
           xargs -0 ls -lt |
           tail -$del |
           awk '{print $8}'
       )
      rm -f ${FILES[@]}
      #delete the originals
    
     fi
    

    It doesn't really work, it doesn't run as the number of files are too large. Is there any other method to get this done?

    • Timo
      Timo over 10 years
      You seem to have a typo in your script tai should be tail. Also if any of the filenames has newline in them ls into tail is not going to work
    • eric moon
      eric moon over 10 years
      @Timo the files names dont have any new line characters and the script running on the linux system doesnt have any typos (thanks for pointing it though :))
  • eric moon
    eric moon over 10 years
    Your first solution doesn't work as linux says 'zsh no interpreter found'. And I also tried the other two, they dint run either.
  • John1024
    John1024 over 10 years
    @JishnuUNair I ran ls -t on a directory with 50,000 files with no problem using the default stack size of 8192. After, reducing the stack size with ulimit -s 100, ls-t on 50,000 files still worked fine.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    It's not only newlines that are a problem with xargs, it's space, tab, single quote, double quote and backslash as well. Also note that ls might fail to execute if the list of files is big.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    command substitution doesn't split on newline, it splits on blanks (it also performs filename generation).
  • Graeme
    Graeme over 10 years
    @Jishnu, you might want to install zsh before you try to use it...
  • John1024
    John1024 over 10 years
    @StephaneChazelas OK, I added a solution for spaces, tabs, single quotes, double quotes and backslashes. In my tests, ls worked fine on the 50,000+ files that OP wanted. Under what circumstances will it fail?
  • John1024
    John1024 over 10 years
    For those trying this, note that debian stable (Wheezy) has sed version 4.2.1 which does not offer -z.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Like all solutions that have a shell expand a wildcard in the arguments to a command, it may reach the limit on the size of arguments+environment passed to a command. That will depend on the number of argument and their size and the system and the version.
  • John1024
    John1024 over 10 years
    @StephaneChazelas As always, thanks for the info and also the edit.