Grep from the end of a file to the beginning

97,446

Solution 1

tac only helps if you also use grep -m 1 (assuming GNU grep) to have grep stop after the first match:

tac accounting.log | grep -m 1 foo

From man grep:

   -m NUM, --max-count=NUM
          Stop reading a file after NUM matching lines.  

In the example in your question, both tac and grep need to process the entire file so using tac is kind of pointless.

So, unless you use grep -m, don't use tac at all, just parse the output of grep to get the last match:

grep foo accounting.log | tail -n 1 

Another approach would be to use Perl or any other scripting language. For example (where $pattern=foo):

perl -ne '$l=$_ if /foo/; END{print $l}' file

or

awk '/foo/{k=$0}END{print k}' file

Solution 2

The reason why

tac file | grep foo | head -n 1

doesn't stop at the first match is because of buffering.

Normally, head -n 1 exits after reading a line. So grep should get a SIGPIPE and exit as well as soon as it writes its second line.

But what happens is that because its output is not going to a terminal, grep buffers it. That is, it's not writing it until it has accumulated enough (4096 bytes in my test with GNU grep).

What that means is that grep will not exit before it has written 8192 bytes of data, so probably quite a few lines.

With GNU grep, you can make it exit sooner by using --line-buffered which tells it to write lines as soon as they are found regardless of whether goes to a terminal or not. So grep would then exit upon the second line it finds.

But with GNU grep anyway, you can use -m 1 instead as @terdon has shown, which is better as it exits at the first match.

If your grep is not the GNU grep, then you can use sed or awk instead. But tac being a GNU command, I doubt you'll find a system with tac where grep is not GNU grep.

tac file | sed "/$pattern/!d;q"                             # BRE
tac file | P=$pattern awk '$0 ~ ENVIRON["P"] {print; exit}' # ERE

Some systems have tail -r to do the same thing as GNU tac does.

Note that, for regular (seekable) files, tac and tail -r are efficient because they do read the files backward, they're not just reading the file fully in memory before printing it backward (as @slm's sed approach or tac on non-regular files would).

On systems where neither tac nor tail -r are available, the only options are to implement the backward-reading by hand with programming languages like perl or use:

grep -e "$pattern" file | tail -n1

Or:

sed "/$pattern/h;$!d;g" file

But those mean finding all the matches and only print the last one.

Solution 3

Here is a possible solution that will find the location of first occurrence of pattern from last:

tac -s "$pattern" -r accounting.log | head -n 1

This makes use of the -s and -r switches of tac which are as follows:

-s, --separator=STRING
use STRING as the separator instead of newline

-r, --regex
interpret the separator as a regular expression

Solution 4

Using sed

Showing some alternative methods to @Terdon's fine answer using sed:

$ sed '1!G;h;$!d' file | grep -m 1 $pattern
$ sed -n '1!G;h;$p' file | grep -m 1 $pattern

Examples

$ seq 10 > file

$ sed '1!G;h;$!d' file | grep -m 1 5
5

$ sed -n '1!G;h;$p' file | grep -m 1 5
5

Using Perl

As a bonus here's a little easier notation in Perl to remember:

$ perl -e 'print reverse <>' file | grep -m 1 $pattern

Example

$ perl -e 'print reverse <>' file | grep -m 1 5
5
Share:
97,446

Related videos on Youtube

eric dykstra
Author by

eric dykstra

Updated on September 18, 2022

Comments

  • eric dykstra
    eric dykstra over 1 year

    I have a file with about 30.000.000 lines (Radius Accounting) and I need to find the last match of a given pattern.

    The command:

    tac accounting.log | grep $pattern
    

    gives what I need, but it's too slow because the OS has to first read the whole file and then send to the pipe.

    So, I need something fast that can read the file from the last line to the first.

  • eric dykstra
    eric dykstra over 10 years
    I'm using tac because I need to find the last match of a given pattern. Using your suggestion "grep -m1" the execution time goes from 0m0.597s to 0m0.007s \o/. Thanks everybody!
  • camh
    camh over 10 years
    Why do you say "tac [...] needs to process the entire file"? The first thing tac does is seek to the end of the file and read a block from the end. You can verify this yourself with strace(1). When combined with grep -m, it should be quite efficient.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    That's (especially the sed one) likely to be several orders of magnitude slower than grep 5 | tail -n1 or sed '/5/h;$!d;g'. It will also potentially use a lot of memory. It's not a lot more portable as you're still using GNU's grep -m.
  • terdon
    terdon over 10 years
    @camh when combined with grep -m it is. The OP was not using -m so both grep and tac were processing the whole thing.
  • Arlene Mariano
    Arlene Mariano almost 6 years
    Could you please expand on the meaning of the awk line?
  • Arlene Mariano
    Arlene Mariano almost 6 years
    OK, Terdon, thanks you. So I understand the full file must/will be read (browsed/processed) with your awk method, even when we are only searching for the last appearance.
  • ychaouche
    ychaouche over 5 years
    Except you will lose everything that is between the start of the line and the pattern.
  • Scott Prive
    Scott Prive about 5 years
    +1 million for the perl -ne example because it uses NO PIPES. That's hugely important if you're running the command via Ansible (as pipes will contaminate $? exit status).
  • Admin
    Admin about 2 years
    Giving grep files to read will make it ignore its standard input stream. In your first example command, the input from tac would be ignored. The user in the question only has a single log file, and their issue is the speed of reading it with tac to find the last match of their pattern. You do not seem to address this.