Find last occurence of string in multiple files
Solution 1
Assuming GNU facilities:
find . -mtime -1 -exec bash -c \
'for f; do tac "$f" | grep -m1 fileprefix; done' _ {} +
Solution 2
If everything is in a single directory, you could do:
for file in *fileprefix*; do
grep 'search string' "$file" | tail -1
done
If these are large files, it might be worth speeding things up by using tac
to print the file in reverse order (last line first) and then grep -m1
to match the first occurrence. That way, you avoid having to read the whole file:
for file in *fileprefix*; do
tac file | grep -m1 'search string'
done
Both of those assume there are no directories matching fileprefix
. If there are, you'll get an error you can just ignore. If that's an issue, check for files only:
for file in *fileprefix*; do
[ -f "$file" ] && tac file | grep -m1 'search string'
done
If you also need the file name printed, add -H
to each grep
invocation. Or, if your grep
doesn't support it, tell it to also search through /dev/null
. That won't change the output but since grep
is given multiple files, it will always print the file name for each hit:
for file in *fileprefix*; do
grep 'search string' "$file" /dev/null | tail -1
done
Solution 3
find . ! -name . -prune -mtime 1 -name 'fileprefix*' \
-exec sed -se'/searchstring/h;$!d;x' {} +
...will work if you have GNU sed
that supports the -s
eparate files option and a POSIX find
.
You should probably add the ! -type d
or -type f
qualifiers, though, because trying to read a directory wont be very useful, and further narrowing the range to regular files could avoid a read hanging on a pipe or serial device file.
The logic is incredibly simple - sed
overwrites its h
old space with a copy of any input line which matches searchstring
, then d
eletes from output all input lines but the last for each input file. When it gets to the last line, it ex
changes its hold and pattern spaces, and so if searchstring
was found at all while it read the file the last such occurrence will be autoprinted to output, else it writes a blank line. (add /./!d
to the tail of the sed
script if that is undesirable).
This will do a single sed
invocation per some 65k input files - or whatever your ARG_MAX
limit is. This should be a very performant solution, and is quite simply implemented.
If you also want the filenames, given a recent GNU sed
you can write them out to separate lines with the F
command, or else you can get them printed by find
in a separate list per batch by appending the -print
primary after +
.
Solution 4
How about:
find . -mtime -1 -name "fileprefix*" -exec sh -c \
'echo "$(grep 'search string' $1 | tail -n 1),$1"' _ {} \;
The above gives you a nice output with the last occurrence of a search string in each file followed by the respective file name after the comma (modify the ",$1" part under echo to change the formatting or remove it if unnecessary). Sample output that searches for '10' search string in files with a "file" name prefix is as follows:
[dmitry@localhost sourceDir]$ find . -mtime -1 -name "file*" -exec sh -c 'echo "$(grep '10' $1 | tail -n 1),$1"' _ {} \;
Another data 02 10,./file02.log
Some data 01 10,./file01.log
Yet another data 03 10,./file03.log
Solution 5
find . -mtime 1 -name 'fileprefix*' -exec grep -Hn 'search string' {} + |
sort -t: -k1,2 -n |
awk -F: '{key=$1 ; $1="" ; $2="" ; gsub(/^ /,"",$0); a[key]=$0}
END {for (key in a) { print key ":" a[key] }}'
This uses GNU grep
's -H
and -n
options to always print both the filename and the linenumber of all matches, then it sorts by the filename and linenumber, and pipes it into awk, which stores the last match for each filename in an array, and eventually prints it.
A fairly brute-force method, but it works.
Related videos on Youtube
Comments
-
Lokesh over 1 year
I need to search multiple log files (all files generated in last 24 hours, all kept in same directory) to find the last occurrence of a string. This is the command I wrote:
find . -mtime 1 | grep fileprefix | xargs grep 'search string' | tail -1
But this returns only last line for one file. Any suggestions on how to tweak this to get all the lines?
-
Admin over 8 yearsdid you try to invert tail and last grep ? find . -mtime 1 | grep fileprefix | xargs tail -1 | grep 'search string'
-
Admin over 8 years
-
-
Lokesh over 8 yearsCan you please elaborate purpose of 'bash -c \' as i am already using bash shell. Also purpose of '_ {} +' at the end.
-
Mathias Begert over 8 years@Lokesh, you can get
find
to execute commands on files using-exec
. Withbash -c
, we're spawning abash
shell that loops through the files found byfind
and executestac .. | grep -m1 fileprefix
on each -
Mathias Begert over 8 years@lokesh, use
-d" "
with cut. Double quotes instead of single -
Overmind Jiang over 8 yearsThe
find
command can filter for the file prefix; thegrep
shouldn't be needed for that. It's also surprising that the search string doesn't figure in this answer. -
Mathias Begert over 8 years@jonathanleffler, the OP needs the file contents examined for the string, not the file names
-
Gilles 'SO- stop being evil' over 8 years“That way, you avoid having to read the whole file” — uh? No, you avoid reading the whole file in grep but you put the whole file through tac instead. It isn't clear to me that this would be faster, though it would depend on whether the match was near the beginning or the end of the file.
-
terdon over 8 years@Gilles no, you don't put the whole file through
tac
either. It will exit as soon as the first match is found. I just tested with a 832M text file and a pattern found on the last line.grep -m 1 pattern file
tool ~7 seconds andtac file | grep -m1 pattern
took0.009
.