How do I grep for multiple patterns on multiple lines?
Solution 1
Updated 18-Nov-2016 (since grep behavior is changed: grep with -P parameter now doesn't support ^
and $
anchors [on Ubuntu 16.04 with kernel v:4.4.0-21-generic])(wrong (non-)fix)
$ grep -Pzo "begin(.|\n)*\nend" file
begin
Some text goes here.
end
note: for other commands just replace the '^' & '$' anchors with new-line anchor '\n'
______________________________
With grep command:
grep -Pzo "^begin\$(.|\n)*^end$" file
If you want don't include the patterns "begin" and "end" in result, use grep with Lookbehind and Lookahead support.
grep -Pzo "(?<=^begin$\n)(.|\n)*(?=\n^end$)" file
Also you can use \K
notify instead of Lookbehind assertion.
grep -Pzo "^begin$\n\K(.|\n)*(?=\n^end$)" file
\K
option ignore everything before pattern matching and ignore pattern itself.
\n
used for avoid printing empty lines from output.
Or as @AvinashRaj suggests there are simple easy grep as following:
grep -Pzo "(?s)^begin$.*?^end$" file
grep -Pzo "^begin\$[\s\S]*?^end$" file
(?s)
tells grep to allow the dot to match newline characters.
[\s\S]
matches any character that is either whitespace or non-whitespace.
And their output without including "begin" and "end" is as following:
grep -Pzo "^begin$\n\K[\s\S]*?(?=\n^end$)" file # or grep -Pzo "(?<=^begin$\n)[\s\S]*?(?=\n^end$)"
grep -Pzo "(?s)(?<=^begin$\n).*?(?=\n^end$)" file
see the full test of all commands here (out of dated as grep behavior with -P parameter is changed)
Note:
^
point the beginning of a line and $
point the end of a line. these added to the around of "begin" and "end" to matching them if they are alone in a line.
In two commands I escaped $
because it also using for "Command Substitution"($(command)
) that allows the output of a command to replace the command name.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
-z, --null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII
NUL character) instead of a newline. Like the -Z or --null option, this option
can be used with commands like sort -z to process arbitrary file names.
Solution 2
In case your grep
doesn't support perl syntax (-P
), you can try joining the lines, matching the pattern, then expanding the lines again as below:
$ tr '\n' , < foo.txt | grep -o "begin.*end" | tr , '\n'
begin
Some text goes here.
end
Related videos on Youtube
Iker
Updated on September 18, 2022Comments
-
Iker over 1 year
To be precise
Some text begin Some text goes here. end Some more text
and I want to extract entire block that starts from "begin" till "end".
with awk we can do like
awk '/begin/,/end/' text
.How to do with grep?
-
h3. over 9 years
-
-
Avinash Raj over 9 yearschange your grep like
grep -Pzo "(?<=begin\n)(.|\n)*(?=\nend)" file
to not to print\n
character which exists on the line begin. -
Avinash Raj over 9 yearsUse DOTALL modifier to make dot to match even newline chars also
grep -Pzo "(?s)begin.*?end" file
-
αғsнιη over 9 years@AvinashRaj thank you I added to avoiding
\n
but you can post your another solution as your own answer ;) -
Avinash Raj over 9 yearsWhy? add it to yours. I have more reps :-)
-
terdon over 9 yearsYou might want to use
grep -Pzo "begin(.|\n)*\nend" file
instead to make sure thatend
only matches at the beginning of a line and not in things likebend
. -
αғsнιη over 9 years@terdon Can I use
^end
instead? or even better^end$
? -
terdon over 9 yearsHuh, yes you can . I had thought that the
^
would only match the beginning of the file when using-z
but apparently not. -
terdon over 9 yearsThe man page says: "-z: Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline." so I would expect the
^
and$
to match just before and just after a\0
instead. Apparently, they're hard coded to match\n
. -
musbach over 7 yearsThe siólution doesn't work. It produces an error:
grep: ein nicht geschütztes ^ oder $ wird mit -Pz nicht unterstützt
The translation of the error is something like:grep: a not protected ^ or $ is not supported with -Pz
-
terdon over 7 yearsI guess
grep
's behavior has changed. I just tested and musbach is right, the^
and$
don't work with-Pz
. It should work as expected if your replace^
and$
with\n
though. -
αғsнιη over 7 years@terdon paste.ubuntu.com/9096940
-
terdon over 7 yearsYes, I know, that's in your answer. I'm sure it worked when you posted this, but try it again today. The behavior of
grep
seems to have changed. -
musbach over 7 years@terdon you are right. This works:
grep -Pzo "begin\n(.|\n)*\nend\n" file
. If I put beforebegin
a\n
(grep -Pzo "\nbegin\n(.|\n)*\nend\n" file
) I get blank line and than the correct output. I guess that\n
produces a linefeed but it looks strange to me. @KasiyA I am on Ubuntu 16.04. On what OS are you? -
terdon over 7 years@musbach yes,
\n
is the newline character. You get an extra newline because with\nbegin
you are including the newline character at the end of the previous line, so that's printed as a blank line. -
αғsнιη over 7 yearsat that time I was on 14.04, but right now I'm far away from my Ubuntu 16.04 to test it, once I come with 16.04 will double check, but for sure grep behavior is changed as Mr. terdon confirmed, @musbach
-
musbach over 7 yearsYes, the answer should be corrected or flaged as wrong.
-
αғsнιη over 7 yearsI have asked this grep wrong behavior grep command doesn't support start '^' and '$' end of line anchors when it's with -Pz @terdon
-
musbach over 7 yearsI added also that it works on 14.04 and it doesn't work not on 10.4 and it doesn't work on 16.04 (see blow). Why it just works on 14.04 is very strange.
-
terdon over 7 years@musbach there's no way (and no reason) to flag an answer as wrong. You've left a comment explaining it, that's all that's needed. The answer was correct when posted, after all.