How do I grep for multiple patterns on multiple lines?

73,678

Solution 1

Updated 18-Nov-2016 (since grep behavior is changed: grep with -P parameter now doesn't support ^ and $ anchors [on Ubuntu 16.04 with kernel v:4.4.0-21-generic])(wrong (non-)fix)

$ grep -Pzo "begin(.|\n)*\nend" file
begin
Some text goes here.  
end

note: for other commands just replace the '^' & '$' anchors with new-line anchor '\n' ______________________________

With grep command:

grep -Pzo "^begin\$(.|\n)*^end$" file

If you want don't include the patterns "begin" and "end" in result, use grep with Lookbehind and Lookahead support.

grep -Pzo "(?<=^begin$\n)(.|\n)*(?=\n^end$)" file

Also you can use \K notify instead of Lookbehind assertion.

grep -Pzo "^begin$\n\K(.|\n)*(?=\n^end$)" file

\K option ignore everything before pattern matching and ignore pattern itself.
\n used for avoid printing empty lines from output.

Or as @AvinashRaj suggests there are simple easy grep as following:

grep -Pzo "(?s)^begin$.*?^end$" file

grep -Pzo "^begin\$[\s\S]*?^end$" file

(?s) tells grep to allow the dot to match newline characters.
[\s\S] matches any character that is either whitespace or non-whitespace.

And their output without including "begin" and "end" is as following:

grep -Pzo "^begin$\n\K[\s\S]*?(?=\n^end$)" file # or grep -Pzo "(?<=^begin$\n)[\s\S]*?(?=\n^end$)"

grep -Pzo "(?s)(?<=^begin$\n).*?(?=\n^end$)" file

see the full test of all commands here (out of dated as grep behavior with -P parameter is changed)

Note:

^ point the beginning of a line and $ point the end of a line. these added to the around of "begin" and "end" to matching them if they are alone in a line.
In two commands I escaped $ because it also using for "Command Substitution"($(command)) that allows the output of a command to replace the command name.

From man grep:

-o, --only-matching
      Print only the matched (non-empty) parts of a matching line,
      with each such part on a separate output line.

-P, --perl-regexp
      Interpret PATTERN as a Perl compatible regular expression (PCRE)

-z, --null-data
      Treat the input as a set of lines, each terminated by a zero byte (the ASCII 
      NUL character) instead of a newline. Like the -Z or --null option, this option 
      can be used with commands like sort -z to process arbitrary file names.

Solution 2

In case your grep doesn't support perl syntax (-P), you can try joining the lines, matching the pattern, then expanding the lines again as below:

$ tr '\n' , < foo.txt | grep -o "begin.*end" | tr , '\n'
begin
Some text goes here.
end
Share:
73,678

Related videos on Youtube

Iker
Author by

Iker

Updated on September 18, 2022

Comments

  • Iker
    Iker over 1 year

    To be precise

    Some text
    begin
    Some text goes here.
    end
    Some more text
    

    and I want to extract entire block that starts from "begin" till "end".

    with awk we can do like awk '/begin/,/end/' text.

    How to do with grep?

  • Avinash Raj
    Avinash Raj over 9 years
    change your grep like grep -Pzo "(?<=begin\n)(.|\n)*(?=\nend)" file to not to print \n character which exists on the line begin.
  • Avinash Raj
    Avinash Raj over 9 years
    Use DOTALL modifier to make dot to match even newline chars also grep -Pzo "(?s)begin.*?end" file
  • αғsнιη
    αғsнιη over 9 years
    @AvinashRaj thank you I added to avoiding \n but you can post your another solution as your own answer ;)
  • Avinash Raj
    Avinash Raj over 9 years
    Why? add it to yours. I have more reps :-)
  • terdon
    terdon over 9 years
    You might want to use grep -Pzo "begin(.|\n)*\nend" file instead to make sure that end only matches at the beginning of a line and not in things like bend.
  • αғsнιη
    αғsнιη over 9 years
    @terdon Can I use ^end instead? or even better ^end$?
  • terdon
    terdon over 9 years
    Huh, yes you can . I had thought that the ^ would only match the beginning of the file when using -z but apparently not.
  • terdon
    terdon over 9 years
    The man page says: "-z: Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline." so I would expect the ^ and $ to match just before and just after a \0 instead. Apparently, they're hard coded to match \n.
  • musbach
    musbach over 7 years
    The siólution doesn't work. It produces an error: grep: ein nicht geschütztes ^ oder $ wird mit -Pz nicht unterstützt The translation of the error is something like: grep: a not protected ^ or $ is not supported with -Pz
  • terdon
    terdon over 7 years
    I guess grep's behavior has changed. I just tested and musbach is right, the ^ and $ don't work with -Pz. It should work as expected if your replace ^ and $ with \n though.
  • αғsнιη
    αғsнιη over 7 years
  • terdon
    terdon over 7 years
    Yes, I know, that's in your answer. I'm sure it worked when you posted this, but try it again today. The behavior of grep seems to have changed.
  • musbach
    musbach over 7 years
    @terdon you are right. This works: grep -Pzo "begin\n(.|\n)*\nend\n" file. If I put before begin a \n (grep -Pzo "\nbegin\n(.|\n)*\nend\n" file) I get blank line and than the correct output. I guess that \n produces a linefeed but it looks strange to me. @KasiyA I am on Ubuntu 16.04. On what OS are you?
  • terdon
    terdon over 7 years
    @musbach yes, \n is the newline character. You get an extra newline because with \nbegin you are including the newline character at the end of the previous line, so that's printed as a blank line.
  • αғsнιη
    αғsнιη over 7 years
    at that time I was on 14.04, but right now I'm far away from my Ubuntu 16.04 to test it, once I come with 16.04 will double check, but for sure grep behavior is changed as Mr. terdon confirmed, @musbach
  • musbach
    musbach over 7 years
    Yes, the answer should be corrected or flaged as wrong.
  • αғsнιη
    αғsнιη over 7 years
  • musbach
    musbach over 7 years
    I added also that it works on 14.04 and it doesn't work not on 10.4 and it doesn't work on 16.04 (see blow). Why it just works on 14.04 is very strange.
  • terdon
    terdon over 7 years
    @musbach there's no way (and no reason) to flag an answer as wrong. You've left a comment explaining it, that's all that's needed. The answer was correct when posted, after all.