Find last occurence of string in multiple files

7,993

Solution 1

Assuming GNU facilities:

find . -mtime -1 -exec bash -c \
'for f; do tac "$f" | grep -m1 fileprefix; done' _ {} +

Solution 2

If everything is in a single directory, you could do:

for file in *fileprefix*; do
    grep 'search string' "$file" | tail -1
done

If these are large files, it might be worth speeding things up by using tac to print the file in reverse order (last line first) and then grep -m1 to match the first occurrence. That way, you avoid having to read the whole file:

for file in *fileprefix*; do
    tac file | grep -m1 'search string'
done

Both of those assume there are no directories matching fileprefix. If there are, you'll get an error you can just ignore. If that's an issue, check for files only:

 for file in *fileprefix*; do
    [ -f "$file" ] && tac file | grep -m1 'search string'
 done

If you also need the file name printed, add -H to each grep invocation. Or, if your grep doesn't support it, tell it to also search through /dev/null. That won't change the output but since grep is given multiple files, it will always print the file name for each hit:

for file in *fileprefix*; do
    grep 'search string' "$file" /dev/null | tail -1
done

Solution 3

find . ! -name . -prune -mtime 1 -name 'fileprefix*' \
     -exec sed -se'/searchstring/h;$!d;x' {} +

...will work if you have GNU sed that supports the -separate files option and a POSIX find.

You should probably add the ! -type d or -type f qualifiers, though, because trying to read a directory wont be very useful, and further narrowing the range to regular files could avoid a read hanging on a pipe or serial device file.

The logic is incredibly simple - sed overwrites its hold space with a copy of any input line which matches searchstring, then deletes from output all input lines but the last for each input file. When it gets to the last line, it exchanges its hold and pattern spaces, and so if searchstring was found at all while it read the file the last such occurrence will be autoprinted to output, else it writes a blank line. (add /./!d to the tail of the sed script if that is undesirable).

This will do a single sed invocation per some 65k input files - or whatever your ARG_MAX limit is. This should be a very performant solution, and is quite simply implemented.

If you also want the filenames, given a recent GNU sed you can write them out to separate lines with the F command, or else you can get them printed by find in a separate list per batch by appending the -print primary after +.

Solution 4

How about:

find . -mtime -1 -name "fileprefix*" -exec sh -c \
'echo "$(grep 'search string' $1 | tail -n 1),$1"' _ {} \;

The above gives you a nice output with the last occurrence of a search string in each file followed by the respective file name after the comma (modify the ",$1" part under echo to change the formatting or remove it if unnecessary). Sample output that searches for '10' search string in files with a "file" name prefix is as follows:

[dmitry@localhost sourceDir]$ find . -mtime -1 -name "file*" -exec  sh -c 'echo "$(grep '10' $1 | tail -n 1),$1"' _ {} \;
Another data 02 10,./file02.log
Some data 01 10,./file01.log
Yet another data 03 10,./file03.log 

Solution 5

find . -mtime 1 -name 'fileprefix*' -exec grep -Hn 'search string' {} + |
    sort -t: -k1,2 -n | 
    awk -F: '{key=$1 ; $1="" ; $2="" ; gsub(/^  /,"",$0); a[key]=$0} 
             END {for (key in a) { print key ":" a[key] }}'

This uses GNU grep's -H and -n options to always print both the filename and the linenumber of all matches, then it sorts by the filename and linenumber, and pipes it into awk, which stores the last match for each filename in an array, and eventually prints it.

A fairly brute-force method, but it works.

Share:
7,993

Related videos on Youtube

Lokesh
Author by

Lokesh

I like to explore technology and learn new things.

Updated on September 18, 2022

Comments

  • Lokesh
    Lokesh over 1 year

    I need to search multiple log files (all files generated in last 24 hours, all kept in same directory) to find the last occurrence of a string. This is the command I wrote:

    find . -mtime 1 | grep fileprefix | xargs grep 'search string' | tail -1
    

    But this returns only last line for one file. Any suggestions on how to tweak this to get all the lines?

  • Lokesh
    Lokesh over 8 years
    Can you please elaborate purpose of 'bash -c \' as i am already using bash shell. Also purpose of '_ {} +' at the end.
  • Mathias Begert
    Mathias Begert over 8 years
    @Lokesh, you can get find to execute commands on files using -exec. With bash -c, we're spawning a bash shell that loops through the files found by find and executes tac .. | grep -m1 fileprefix on each
  • Mathias Begert
    Mathias Begert over 8 years
    @lokesh, use -d" " with cut. Double quotes instead of single
  • Overmind Jiang
    Overmind Jiang over 8 years
    The find command can filter for the file prefix; the grep shouldn't be needed for that. It's also surprising that the search string doesn't figure in this answer.
  • Mathias Begert
    Mathias Begert over 8 years
    @jonathanleffler, the OP needs the file contents examined for the string, not the file names
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 8 years
    “That way, you avoid having to read the whole file” — uh? No, you avoid reading the whole file in grep but you put the whole file through tac instead. It isn't clear to me that this would be faster, though it would depend on whether the match was near the beginning or the end of the file.
  • terdon
    terdon over 8 years
    @Gilles no, you don't put the whole file through tac either. It will exit as soon as the first match is found. I just tested with a 832M text file and a pattern found on the last line. grep -m 1 pattern file tool ~7 seconds and tac file | grep -m1 pattern took 0.009.