Keeping the number of files in the folder constant by deleting old files
Solution 1
I ran the command:
find /mnt/md0/capture/DCN/ -maxdepth 1 -type f -name "*.pcap" -print0 |
xargs -0 ls -lt | tail -n "$del" | awk '{print $8}'
The problem that I observed was that awk '{print $8}'
prints the time, not the file name. awk '{print $9}'
would solve that.
Another problems is that xargs
may run ls -lt
several times which would give you a number of sorted lists of files one after the other, but the whole list would not be sorted.
But, there appear to be other simplifications one could make. You can get the oldest files with:
ls -dt /mnt/md0/capture/DCN/*.pcap | tail -n "$del"
This assumes, as your post seemed to, that the file names have no spaces, tabs, or newline characters in them.
So, the full command for deleting the oldest $del
files could be:
ls -dt /mnt/md0/capture/DCN/*.pcap | tail -n "$del" | xargs rm
MORE: If your file names may contain spaces, tabs, backslashes, or quotes in them (but not newlines), use (assuming GNU ls
4.0 (1998) or newer):
ls -dt --quoting-style=shell-always /mnt/md0/capture/DCN/*.pcap |
tail -n "$del" | xargs rm
Solution 2
For those not wanting to make assumptions on the names of the files:
With zsh
:
#! /bin/zsh -
keep=5000
rm -f /mnt/md0/capture/DCN/*.pcap(D.om[$((keep+1)),-1])
That's using zsh
globbing qualifiers:
D
: includes hidden files (Dot files)..
: only regular files (likefind
's-type f
)om
: reverse order on the age (based on modification time)[$((keep+1)),-1]
: only include the 5001st to the last.
(it may fail if the list of files to remove is very big, in which case you may want to use zargs
to split it, or enable zsh
's builtin rm
with zmodload zsh/files
).
With relatively recent versions of GNU tools:
cd /mnt/md0/capture/DCN/ &&
find . -maxdepth 1 -name '*.pcap' -type f -printf '%T@@%p\0' |
sort -zrn | sed -z "s/[^@]*@//;1,$keep d" | xargs -r0 rm -f
(assuming GNU sed 4.2.2 or above (2012) for -z
, GNU sort
1.14 or above (1996) for -z
)
find
builds a NUL delimited list of filenames with a Unix timestamp prepended (like 1390682991.0859627500@./file
) which is sorted by sort
. sed
removes the timestamp and prints only from the 5001st record. That's passed as arguments to rm
using xargs -r0
.
or (with any version of GNU tools):
cd /mnt/md0/capture/DCN/ &&
find . -maxdepth 1 -name '*.pcap' -type f -printf '%T@@%p\0' |
tr '\0\n' '\n\0' | sort -rn | tail -n "+$(($keep+1))" |
cut -d @ -f2- | tr '\0\n' '\n\0' | xargs -r0 rm -f
Same, except that we're using cut
to remove the timestamp and tail
to select the lines starting from 5001. Because GNU cut
and tail
don't support a -z
to work on NUL delimited records, we use tr
to swap the newline and NUL characters before and after feeding the data to them.
With GNU ls
(4.0 (1998) or above), and bash
:
shopt -s dotglob
cd /mnt/md0/capture/DCN/ &&
eval "files=($(ls -dt --quoting-style=shell-always -- *.pcap))" &&
rm -f -- "${files[@]:$keep}"
(that also may fail if the list of file is big. Also note that it may include non-regular pcap files (no -type f
)).
Standardly/POSIXly/portably, that's a lot trickier:
cd /mnt/md0/capture/DCN/ &&
ls -dt ./.pcap ./.*.pcap ./*.pcap | awk -v keep="$keep" '
function process() {
if (++n > keep) {
gsub(/[ \t\n"\\'\'']/,"\\\\&", file)
print file
file = ""
}
}
/\// {
if (NR > 1) process()
file=$0
next
}
{file = file "\n" $0}
END {if (NR > 0) process()}' | xargs rm -f
(again, you may reach the limit of the number of arguments, and it does not check for regular files).
The tricky bit there is to handle the filenames with newline characters. Above, we're passing ./*
to ls
which means /
will be included once for each file name, and we use that in awk
to identify on which line each filename starts, then we know which newline character (in addition to all the other ones special to xargs
) to escape for xargs
.
Solution 3
Assuming none of the file names contain space, tab, newline, single quote, double quote or backslash characters, this deletes the oldest files above limit
mkdir t && cd t
# 50500 files, 500 to delete
touch {000001..050500}
limit=50000
ls -t|tail -n "+$(($limit + 1))"|xargs rm
ls|wc -l
50000
tail -n +50001
shows files above limit.
Related videos on Youtube
Comments
-
eric moon over 1 year
I'm trying to create a script and run it in crontab every 5 min, so that the number of files in a folder always remain 50000. If there are more, I want the script to delete the old files.
#!/bin/bash LIMIT=500000 NO=0 #Get the number of files, that has `*.pcap` in its name, with last modified time 5 days ago NUMBER=$(find /mnt/md0/capture/DCN/ -maxdepth 1 -name "*.pcap" |wc -l) if [[ $NUMBER -gt $LIMIT ]] #if number greater than limit then del=$(($NUMBER-$LIMIT)) if [ "$del" -lt "$NO" ] then del=$(($del*-1)) fi echo $del FILES=$( find /mnt/md0/capture/DCN/ -maxdepth 1 -type f -name "*.pcap" -print0 | xargs -0 ls -lt | tail -$del | awk '{print $8}' ) rm -f ${FILES[@]} #delete the originals fi
It doesn't really work, it doesn't run as the number of files are too large. Is there any other method to get this done?
-
Timo over 10 yearsYou seem to have a typo in your script
tai
should betail
. Also if any of the filenames has newline in themls
intotail
is not going to work -
eric moon over 10 years@Timo the files names dont have any new line characters and the script running on the linux system doesnt have any typos (thanks for pointing it though :))
-
-
eric moon over 10 yearsYour first solution doesn't work as linux says 'zsh no interpreter found'. And I also tried the other two, they dint run either.
-
John1024 over 10 years@JishnuUNair I ran
ls -t
on a directory with 50,000 files with no problem using the default stack size of 8192. After, reducing the stack size withulimit -s 100
,ls-t
on 50,000 files still worked fine. -
Stéphane Chazelas over 10 yearsIt's not only newlines that are a problem with
xargs
, it's space, tab, single quote, double quote and backslash as well. Also note thatls
might fail to execute if the list of files is big. -
Stéphane Chazelas over 10 yearscommand substitution doesn't split on newline, it splits on blanks (it also performs filename generation).
-
Graeme over 10 years@Jishnu, you might want to install
zsh
before you try to use it... -
John1024 over 10 years@StephaneChazelas OK, I added a solution for spaces, tabs, single quotes, double quotes and backslashes. In my tests,
ls
worked fine on the 50,000+ files that OP wanted. Under what circumstances will it fail? -
John1024 over 10 yearsFor those trying this, note that debian stable (Wheezy) has
sed
version 4.2.1 which does not offer-z
. -
Stéphane Chazelas over 10 yearsLike all solutions that have a shell expand a wildcard in the arguments to a command, it may reach the limit on the size of arguments+environment passed to a command. That will depend on the number of argument and their size and the system and the version.
-
John1024 over 10 years@StephaneChazelas As always, thanks for the info and also the edit.