How can I search for a multiline pattern in a file?

linux command-line grep find pcregrep

144,758

Solution 1

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

Solution 2

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename

Solution 3

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

-z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See description here

Solution 4

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

Solution 5

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html

View more solutions

144,758

Oli

Python developer vim user

Updated on January 21, 2022

Comments

Oli over 2 years
I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:
```
find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'
```
But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.
- glerYbo about 6 years
  
  Possible duplicate of How to find patterns across multiple lines using grep?
- rogerdpack over 5 years
  
  This one's older, so I'd say it's not a duplicate :)
- tripleee almost 5 years
  
  @rogerdpack When marking questions as duplicates, the age of a question is a tertiary concern, after the amount and quality of answers and the quality of the question.
- rogerdpack over 2 years
  
  Makes sense, voting to close since it's a "duplicate now"
Ali Karbassi over 13 years

This is much easier to understand and uses awk that comes with most *nix systems.
matt about 13 years

thanks for this. I was stuck not realizing that a wildcard wouldn't match the newline character.
lubomir.brindza almost 13 years

@matt: you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression, like so: "(?s)<html>.*</html>"
Cloud about 12 years

That only accounts for a single new-line character, I think.
Thaqif Yusoff almost 12 years

nice! is there a way to make this match non-greedy?
Bibek Shrestha almost 12 years

How would you only print the filename when there is a match?
bbaja42 over 11 years

I wasn't able to use grep for multiline search, without using flags -z so it doesn't split search on single line, and -o to print only matched part.
Jim over 11 years

As mentioned by halka below, "you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression". Then use grep with perl regex by adding -P. find . -exec grep -nHP '(?s)SELECT.{1,60}FROM.{1,20}table_name' '{}' \;
Benubird about 11 years

I found that -o caused it to not print anything, but -l worked to get a list of files (my command was grep -rzl pattern *, -rzo didn't work)
Jared Beck almost 11 years

pcregrep is available on the mac with brew install pcre
Ciro Santilli OurBigBook.com over 9 years

Even better: also use -H which prints the filename before each match: pcregrep -HM.
Robert over 9 years

You can show the line numbers of the matches with awk '/Start pattern/,/End pattern/ {printf NR " "; print}' filename. You can make it prettier by giving the line numbers a fixed width: awk '/Start pattern/,/End pattern/ {printf "%-4s ", NR; print}' filename.
rloth over 9 years

I recommend ''grep -Pazo'' instead of ''-Pzo'' for non-ASCII files. It's better because the -z switch on non-ASCII files may trigger grep's "binary data" behaviour which changes the return values. Switch ''-a | --text'' prevents that.
Quanlong about 9 years

Does not work on Mac with git installed by brew reinstall --with-pcre git
fedorqui about 6 years

@Ɖiamond ǤeezeƦ note that editing a post in the LQP (stackoverflow.com/review/low-quality-posts/19341146) invalidates the review, so just edit if you are sure the post needs to be maintained.
Jinstrong almost 6 years

This seems to work nicely on single file, however, what if I would like to search within multiple files?
Michael Goldshteyn almost 6 years

@marcin, I just tried this with gnu awk 4.2.1 and it appears to be greedy only with regard to the Start pattern, by default, since it just search for the end pattern after finding the start pattern.
hoefling almost 6 years

@Jinstrong use pipes. for example, find . -name "*.txt" | xargs -n1 awk '/foo/,/bar/' will recursively search all txt files in the current directory.
Paul Allsopp over 5 years

Use grep to find the list of files which contain the basic word/words you're looking for, and then use awk to drill into each file via a for...in loop
Herbert over 5 years

This prints the whole file though
rogerdpack over 5 years

Apparently making this non greedy is "non trivial" unix.stackexchange.com/questions/49601/… however the pcregrep command can do so.
rogerdpack over 5 years

Hmm tried this just now and didn't seem to work... gist.github.com/rdp/0286d91624930bd11d0169d6a6337c33
Pryftan over 4 years

I didn't know grep had this option. Probably because of this: This is highly experimental and grep -P may warn of unimplemented features.; that's under CentOS 7. Under Fedora 29: This is experimental and grep -P may warn of unimplemented features. Of course in BSD grep it's not there at all. Would be nice if it wasn't so experimental but it's nice to be reminded of it - little though I'm likely to use it.
Pryftan over 4 years

@matt Of course you can check for $ (at the end of a pattern) to signify it's the end of the line - though that's not the same thing as helping you find multiple line patterns. See also glob(7). You might also find this website of interest: regular-expressions.info
Nuvious over 3 years

Thanks for this! Helped me filter some log files that needed a multi-line match.
JonTheNiceGuy over 2 years

This worked for me, just the block I needed, on OS X.
Myridium over 2 years

pcregrep: line 1 of file /dev/fd/63 is too long for the internal buffer when acting on a simple text file like <(cat file.txt | tr '\0' '\n').
rogerdpack over 2 years

Works with grep -Pzo (though adds a trailing NUL char, see some of the other answers). grep -P is common in "linux" but not BSD...