How can I search for a multiline pattern in a file?
Solution 1
So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.
the -M option makes it possible to search for patterns that span line boundaries.
For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:
find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'
Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...
Solution 2
Why don't you go for awk:
awk '/Start pattern/,/End pattern/' filename
Solution 3
Here is the example using GNU grep
:
grep -Pzo '_name.*\n.*_description'
-z
/--null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.
Which has the effect of treating the whole file as one large line. See description here
Solution 4
grep -P
also uses libpcre, but is much more widely installed. To find a complete title
section of an html document, even if it spans multiple lines, you can use this:
grep -P '(?s)<title>.*</title>' example.html
Since the PCRE project implements to the perl standard, use the perl documentation for reference:
Solution 5
Here is a more useful example:
pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html
It searches the title tag in a html file even if it spans up to 5 lines.
Here is an example of unlimited lines:
pcregrep -Mi "(?s)<title>.*</title>" example.html
Related videos on Youtube
Comments
-
Oli over 2 years
I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:
find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'
But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.
-
glerYbo about 6 yearsPossible duplicate of How to find patterns across multiple lines using grep?
-
rogerdpack over 5 yearsThis one's older, so I'd say it's not a duplicate :)
-
tripleee almost 5 years@rogerdpack When marking questions as duplicates, the age of a question is a tertiary concern, after the amount and quality of answers and the quality of the question.
-
rogerdpack over 2 yearsMakes sense, voting to close since it's a "duplicate now"
-
-
Ali Karbassi over 13 yearsThis is much easier to understand and uses
awk
that comes with most *nix systems. -
matt about 13 yearsthanks for this. I was stuck not realizing that a wildcard wouldn't match the newline character.
-
lubomir.brindza almost 13 years@matt: you can also persuade the dot wildcard to match newlines if you add
(?s)
to your regular expression, like so:"(?s)<html>.*</html>"
-
Cloud about 12 yearsThat only accounts for a single new-line character, I think.
-
Thaqif Yusoff almost 12 yearsnice! is there a way to make this match non-greedy?
-
Bibek Shrestha almost 12 yearsHow would you only print the filename when there is a match?
-
bbaja42 over 11 yearsI wasn't able to use grep for multiline search, without using flags
-z
so it doesn't split search on single line, and-o
to print only matched part. -
Jim over 11 yearsAs mentioned by halka below, "you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression". Then use grep with perl regex by adding -P. find . -exec grep -nHP '(?s)SELECT.{1,60}FROM.{1,20}table_name' '{}' \;
-
Benubird about 11 yearsI found that -o caused it to not print anything, but -l worked to get a list of files (my command was
grep -rzl pattern *
, -rzo didn't work) -
Jared Beck almost 11 years
pcregrep
is available on the mac withbrew install pcre
-
Ciro Santilli OurBigBook.com over 9 yearsEven better: also use
-H
which prints the filename before each match:pcregrep -HM
. -
Robert over 9 yearsYou can show the line numbers of the matches with
awk '/Start pattern/,/End pattern/ {printf NR " "; print}' filename
. You can make it prettier by giving the line numbers a fixed width:awk '/Start pattern/,/End pattern/ {printf "%-4s ", NR; print}' filename
. -
rloth over 9 yearsI recommend ''grep -Pazo'' instead of ''-Pzo'' for non-ASCII files. It's better because the -z switch on non-ASCII files may trigger grep's "binary data" behaviour which changes the return values. Switch ''-a | --text'' prevents that.
-
Quanlong about 9 yearsDoes not work on Mac with git installed by
brew reinstall --with-pcre git
-
fedorqui about 6 years@Ɖiamond ǤeezeƦ note that editing a post in the LQP (stackoverflow.com/review/low-quality-posts/19341146) invalidates the review, so just edit if you are sure the post needs to be maintained.
-
Jinstrong almost 6 yearsThis seems to work nicely on single file, however, what if I would like to search within multiple files?
-
Michael Goldshteyn almost 6 years@marcin, I just tried this with gnu awk 4.2.1 and it appears to be greedy only with regard to the Start pattern, by default, since it just search for the end pattern after finding the start pattern.
-
hoefling almost 6 years@Jinstrong use pipes. for example,
find . -name "*.txt" | xargs -n1 awk '/foo/,/bar/'
will recursively search all txt files in the current directory. -
Paul Allsopp over 5 yearsUse grep to find the list of files which contain the basic word/words you're looking for, and then use awk to drill into each file via a for...in loop
-
Herbert over 5 yearsThis prints the whole file though
-
rogerdpack over 5 yearsApparently making this non greedy is "non trivial" unix.stackexchange.com/questions/49601/… however the
pcregrep
command can do so. -
rogerdpack over 5 yearsHmm tried this just now and didn't seem to work... gist.github.com/rdp/0286d91624930bd11d0169d6a6337c33
-
Pryftan over 4 yearsI didn't know grep had this option. Probably because of this: This is highly experimental and grep -P may warn of unimplemented features.; that's under CentOS 7. Under Fedora 29: This is experimental and grep -P may warn of unimplemented features. Of course in BSD grep it's not there at all. Would be nice if it wasn't so experimental but it's nice to be reminded of it - little though I'm likely to use it.
-
Pryftan over 4 years@matt Of course you can check for
$
(at the end of a pattern) to signify it's the end of the line - though that's not the same thing as helping you find multiple line patterns. See alsoglob(7)
. You might also find this website of interest: regular-expressions.info -
Nuvious over 3 yearsThanks for this! Helped me filter some log files that needed a multi-line match.
-
JonTheNiceGuy over 2 yearsThis worked for me, just the block I needed, on OS X.
-
Myridium over 2 years
pcregrep: line 1 of file /dev/fd/63 is too long for the internal buffer
when acting on a simple text file like<(cat file.txt | tr '\0' '\n')
. -
rogerdpack over 2 yearsWorks with
grep -Pzo
(though adds a trailing NUL char, see some of the other answers). grep -P is common in "linux" but not BSD...