Reverse grepping

15,075

Solution 1

tac/grep Solution

tac file | grep whatever

Or a bit more effective:

grep whatever < <(tac file)

Time with a 500MB file:

real    0m1.225s
user    0m1.164s
sys     0m0.516s

sed/grep Solution:

sed '1!G;h;$!d' | grep whatever

Time with a 500MB file: Aborted after 10+ minutes.

awk/grep Solution:

awk '{x[NR]=$0}END{while (NR) print x[NR--]}' file | grep whatever

Time with a 500MB file:

real    0m5.626s
user    0m4.964s
sys     0m1.420s

perl/grep Solution:

perl -e 'print reverse <>' file | grep whatever

Time with a 500MB file:

real    0m3.551s
user    0m3.104s
sys     0m1.036s

Solution 2

This solution might help:

tac file_name | grep -e expression

Solution 3

This one exits as soon as it finds the first match:

 tac hugeproduction.log | grep -m1 WhatImLookingFor

The following gives the 5 lines before and after the first two matches:

 tac hugeproduction.log | grep -m2 -A 5 -B 5 WhatImLookingFor

Remember not to use -i (case insensitive) unless you have to as that will slow down the grep.

If you know the exact string you are looking for then consider fgrep (Fixed String)

 tac hugeproduction.log | grep -F -m2 -A 5 -B 5 'ABC1234XYZ'

Solution 4

If the file is really big, can not fit in memory, I will use Perl with File::ReadBackwards module from CPAN:

$ cat reverse-grep.pl
#!/usr/bin/perl

use strict;
use warnings;

use File::ReadBackwards;

my $pattern = shift;
my $rev = File::ReadBackwards->new(shift)
    or die "$!";

while (defined($_ = $rev->readline)) {
    print if /$pattern/;
}

$rev->close;

Then:

$ ./reverse-grep.pl pattern file
Share:
15,075

Related videos on Youtube

chaos
Author by

chaos

Updated on September 18, 2022

Comments

  • chaos
    chaos over 1 year

    Let's say, I have a really big text file (about 10.000.000 lines). I need to grep it from the end and save result to a file. What's the most efficient way to accomplish task?

    • Ulrich Schwarz
      Ulrich Schwarz almost 10 years
      In addition to the excellent solutions posted, GNU grep has a --max-count (number) switch that aborts after a certain number of matches, which might be interesting to you.
    • c0rp
      c0rp almost 10 years
      @val0x00ff could you take a look at this question
    • Walter A
      Walter A about 9 years
      Do you know how much hits you will have? When you think your grep will find 3 lines, start grepping and reverse afterwards.
  • Marek Zakrzewski
    Marek Zakrzewski almost 10 years
    @chaos, I think grep "somepattern" < <(tac filename) will be faster.
  • vinc17
    vinc17 almost 10 years
    @val0x00ff The < <(tac filename) should be as fast as a pipe: in both cases, the commands run in parallel.
  • phemmer
    phemmer almost 10 years
    If you're going for efficiency, it would be better to put the tac after the grep. If you've got a 10,000,000 line file, with only 2 matches, tac will only have to reverse 2 lines, not 10m. grep is still going to have to go through the whole thing either way.
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    tac is the GNU command. On most other systems, the equivalent is tail -r.
  • Ayman
    Ayman almost 10 years
    @Stéphane: On at least some Unix systems, tail -r is limited to a small number of lines, this might be an issue.
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    @RedGrittyBrick, do you have any reference for that, or could you please tell which systems have that limitation?
  • jjanes
    jjanes almost 10 years
    If you put tac after the grep, it will be reading from a pipe and so can't seek. That will make it less efficient (or fail completely) if the number of found lines is large.
  • Cristian Ciupitu
    Cristian Ciupitu almost 10 years
    @StéphaneChazelas, tail -r /etc/passwd fails with tail: invalid option -- 'r'. I'm using coreutils-8.21-21.fc20.x86_64.
  • Stéphane Chazelas
    Stéphane Chazelas almost 10 years
    @CristianCiupitu, as I said, GNU has tac (and only GNU has tac) many other Unices have tail -r. GNU tail doesn't support -r
  • Bernhard
    Bernhard almost 10 years
    @jjanes Can you expand a bit on that? I don't get your point, what is tac trying to seek?
  • c0rp
    c0rp almost 10 years
    @Bernhard Please look at this
  • jjanes
    jjanes almost 10 years
    @Bernhard If you tac a real file, it lseeks backwards through the file to read it backwards in chunks, and then reverses the lines in each chunk, remembering the line broken across chunks to put them back together. If reading from a pipe, it can't do that. It either needs to read the whole thing into memory, or write it to a temp file, or fail.
  • cuonglm
    cuonglm almost 10 years
    @zzapper: It's memory efficient, too, since when it read file line by line instead of slurp file in memory like tac.
  • Barmar
    Barmar almost 10 years
    But if you put tac after grep, it only has to reverse the matched lines, not the whole file. So unless you're matching lots of lines in the file, it should be reasonably efficient.
  • Scott - Слава Україні
    Scott - Слава Україні almost 10 years
    @Patrick, et. al. So the “obvious” compromise is to do grep (pattern) (input_file) > (temp_file); tac (temp_file) > (output_file); rm (temp_file), right? Note that, if the user wants to know line numbers of matches (by specifying the -n option), this will report correct line numbers in the original input file, whereas tac (input_file) | grep -n (pattern) will report, for example, the third-to-last line in the file as line 3. (Of course, that might be what the OP wants.)
  • Adam Katz
    Adam Katz over 9 years
    So a more portable solution would be (if command -v tac >/dev/null 2>&1; then file_name; else tail -r file_name; fi) |grep expression (this should be a fair assumption since GNU Coreutils supplies both tac and tail, so a system without tac should have non-GNU tail and therefore support for tail -r).
  • ychaouche
    ychaouche over 5 years
    can anyone add a -m support for this ? I'd like to test in on real files. See : gist.githubusercontent.com/ychaouche/…