grep to return Nth and Mth lines before and after the match

9,381

Solution 1

The tool you want to use is called sift. This is basically a grep on steroids. Grep in parallel. Sift has a huge amount of options to do exactly what you want - specifically to return a particular line relative to a match(s) which may/may not be followed by /preceded by some text.

It amazes me that sift is not mainstream gnu as it was written in the go language but installs on Linux just fine. IT searches in parallel using all cpus huge quantities of text where grep just takes weeks to do the same.

Sift website - see examples

Solution 2

If:

cat file
a
b
c
d
e
f match
g
h
i match
j
k
l
m
n
o

Then:

awk '
    {line[NR] = $0} 
    /match/ {matched[NR]} 
    END {
        for (nr in matched)
            for (n=nr-5; n<=nr+5; n+=5) 
                print line[n]
    }
' file
a
f match
k
d
i match
n

Solution 3

It can't be done with only grep. If ed's an option:

ed -s file << 'EOF' 
g/match/-5p\
+5p\
+5p
EOF  

The script basically says: for every match of /match/, print the line 5 lines before that, then 5 lines after that, then 5 lines after that.

Solution 4

This is basically Glenn's solution, but implemented with Bash, Grep, and sed.

grep -n match file |
    while IFS=: read nr _; do
        sed -ns "$((nr-5))p; $((nr))p; $((nr+5))p" file
    done

Note that line numbers less than 1 will make sed error, and line numbers greater than the number of lines in the file will make it print nothing.

This is just the bare minimum. To make it work recursively and handle the above line number cases would take some doing.

Solution 5

awk '/match/{system("sed -n \"" NR-5 "p;" NR "p;" NR+5 "p\" " FILENAME)}' infile

Here we are using awk's system(command) function to call external sed command to print the lines which awk matched with pattern match with 5th lines before and after the match.

The syntax is easy, you just need to put the external command itself inside double-quote as well as its switches and escape the things you want exactly pass to the command, everything else related to the awk itself options should be outside of the quotes. So the below sed:

"sed -n \"" NR-5 "p;" NR "p;" NR+5 "p\" " FILENAME

translate into:

sed -n "NR-5p; NRp; NR+5p" FILENAME

NR is the line number that matched with the pattern match and FILENAME is the of current processing filename passing by awk.

Share:
9,381

Related videos on Youtube

chollida
Author by

chollida

Updated on September 18, 2022

Comments

  • chollida
    chollida almost 2 years

    I know that with grep I can use the fields -A and -B to pull previous and next lines from a match.

    However they pull in all lines between the match based on however many lines are specified.

    grep -r -i -B 5 -A 5 "match" 
    

    I'd like to only receive the 5th line before a match and the 5th line after the match in addition to the matched line and not get the lines between.

    Is there a way to do this with the grep?

    • Terrance
      Terrance about 6 years
      You could do it by piping it into sed. I just tested this and it worked, but it only worked when there was 1 exact match in the file: grep -r -i -B 5 -A 5 "match" | sed -e 1b -e '$!d'
    • chollida
      chollida about 6 years
      @Terrance thanks for the suggestion, as you mention, since I am collecting 1000's of lines this won't work.
    • Joshua Besneatte
      Joshua Besneatte about 6 years
      I don't think grep will work by itself... I'm working on a bash script for you
    • Terrance
      Terrance about 6 years
      No problem! Kind of interested in seeing what answers you get. =)
    • Joshua Besneatte
      Joshua Besneatte about 6 years
      is this in one file or in multiple files?
    • chollida
      chollida about 6 years
      @JoshuaBesneatte both, I run it recursively on a directory where any given file can have zero or more matches.
  • JoL
    JoL about 6 years
    @ubashu Do you think it'll be more helpful to the OP giving a simple flat "it can't be done with grep"? I'm providing what I believe to be a good alternative to solve OP's problem. From the Help Center: "What, specifically, is the question asking for? Make sure your answer provides that – or a viable alternative. The answer can be 'don’t do that', but it should also include 'try this instead'."
  • dessert
    dessert about 6 years
    ed is always an answer, because ed is the standard text editor.
  • Thomas Ward
    Thomas Ward about 6 years
    @ubashu Though it's not a grep answer, the answer of "You can't do it with X, but you can do it with Y, here's how" is still a valid answer since you not only answer OP's question but you also provide an alternative that would work. This is a valid type of answer here.
  • Fabby
    Fabby about 6 years
    João, you're showing up in the LQ review queue and @waltinator voted to delete, so next time be a tiny bit more verbose... ;-) Also +1 to get you out of the LQ queue... :P
  • Admin
    Admin about 6 years
    @Fabby, thank you ! (sorry what is LQ review queue?)
  • wjandrea
    wjandrea about 6 years
    @JJoao Low quality review queue. Your answer probably got picked up there because it was 90% code.
  • Admin
    Admin about 6 years
    @wjandrea, thank you: I was not aware of those mechanisms. Although the 90% quality measure is not very accurate, I admit that my coffee break answer is awful !!!☺
  • wjandrea
    wjandrea about 6 years
    @JJoao The 90% figure is just my way of explaining it. I don't know what heuristics are actually used.
  • Admin
    Admin about 6 years
    @wjandrea, that heuristic makes a lot of sense!
  • Fabby
    Fabby about 6 years
    Menos café, mais escrita! @JJoao :D ;-) :D
  • Admin
    Admin about 6 years
    @Fabby: Sem café nada funciona :D -- probably it would show up in the LCQ (=low coffee queue)
  • Joe
    Joe about 6 years
    +1, but could you explain the semantics of /match/ {matched[NR]}? I've never seen an array or variable as an entire command. Is it putting the current record number of each matched line into the array.
  • fiatux
    fiatux about 6 years
    This is an awk oddity:if you reference an array element without assignment, that key is added to the array (without a value). Then that key shows up in the expression key in array. What I'm doing is remembering the line numbers where the pattern appears
  • Bernard Wei
    Bernard Wei almost 5 years
    Welcome to AskUbuntu, thanks for answering. You need to provide a CLI example that can solve this specific problem rather than providing a link to sift website. This is a Q&A afterall, thanks.