How can I search for a multiline pattern in a file?

144,758

Solution 1

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

Solution 2

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename

Solution 3

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

-z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See description here

Solution 4

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

Solution 5

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html 
Share:
144,758

Related videos on Youtube

Oli
Author by

Oli

Python developer vim user

Updated on January 21, 2022

Comments

  • Oli
    Oli over 2 years

    I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

    find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'
    

    But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.

    • glerYbo
      glerYbo about 6 years
    • rogerdpack
      rogerdpack over 5 years
      This one's older, so I'd say it's not a duplicate :)
    • tripleee
      tripleee almost 5 years
      @rogerdpack When marking questions as duplicates, the age of a question is a tertiary concern, after the amount and quality of answers and the quality of the question.
    • rogerdpack
      rogerdpack over 2 years
      Makes sense, voting to close since it's a "duplicate now"
  • Ali Karbassi
    Ali Karbassi over 13 years
    This is much easier to understand and uses awk that comes with most *nix systems.
  • matt
    matt about 13 years
    thanks for this. I was stuck not realizing that a wildcard wouldn't match the newline character.
  • lubomir.brindza
    lubomir.brindza almost 13 years
    @matt: you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression, like so: "(?s)<html>.*</html>"
  • Cloud
    Cloud about 12 years
    That only accounts for a single new-line character, I think.
  • Thaqif Yusoff
    Thaqif Yusoff almost 12 years
    nice! is there a way to make this match non-greedy?
  • Bibek Shrestha
    Bibek Shrestha almost 12 years
    How would you only print the filename when there is a match?
  • bbaja42
    bbaja42 over 11 years
    I wasn't able to use grep for multiline search, without using flags -z so it doesn't split search on single line, and -o to print only matched part.
  • Jim
    Jim over 11 years
    As mentioned by halka below, "you can also persuade the dot wildcard to match newlines if you add (?s) to your regular expression". Then use grep with perl regex by adding -P. find . -exec grep -nHP '(?s)SELECT.{1,60}FROM.{1,20}table_name' '{}' \;
  • Benubird
    Benubird about 11 years
    I found that -o caused it to not print anything, but -l worked to get a list of files (my command was grep -rzl pattern *, -rzo didn't work)
  • Jared Beck
    Jared Beck almost 11 years
    pcregrep is available on the mac with brew install pcre
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 9 years
    Even better: also use -H which prints the filename before each match: pcregrep -HM.
  • Robert
    Robert over 9 years
    You can show the line numbers of the matches with awk '/Start pattern/,/End pattern/ {printf NR " "; print}' filename. You can make it prettier by giving the line numbers a fixed width: awk '/Start pattern/,/End pattern/ {printf "%-4s ", NR; print}' filename.
  • rloth
    rloth over 9 years
    I recommend ''grep -Pazo'' instead of ''-Pzo'' for non-ASCII files. It's better because the -z switch on non-ASCII files may trigger grep's "binary data" behaviour which changes the return values. Switch ''-a | --text'' prevents that.
  • Quanlong
    Quanlong about 9 years
    Does not work on Mac with git installed by brew reinstall --with-pcre git
  • fedorqui
    fedorqui about 6 years
    @Ɖiamond ǤeezeƦ note that editing a post in the LQP (stackoverflow.com/review/low-quality-posts/19341146) invalidates the review, so just edit if you are sure the post needs to be maintained.
  • Jinstrong
    Jinstrong almost 6 years
    This seems to work nicely on single file, however, what if I would like to search within multiple files?
  • Michael Goldshteyn
    Michael Goldshteyn almost 6 years
    @marcin, I just tried this with gnu awk 4.2.1 and it appears to be greedy only with regard to the Start pattern, by default, since it just search for the end pattern after finding the start pattern.
  • hoefling
    hoefling almost 6 years
    @Jinstrong use pipes. for example, find . -name "*.txt" | xargs -n1 awk '/foo/,/bar/' will recursively search all txt files in the current directory.
  • Paul Allsopp
    Paul Allsopp over 5 years
    Use grep to find the list of files which contain the basic word/words you're looking for, and then use awk to drill into each file via a for...in loop
  • Herbert
    Herbert over 5 years
    This prints the whole file though
  • rogerdpack
    rogerdpack over 5 years
    Apparently making this non greedy is "non trivial" unix.stackexchange.com/questions/49601/… however the pcregrep command can do so.
  • rogerdpack
    rogerdpack over 5 years
    Hmm tried this just now and didn't seem to work... gist.github.com/rdp/0286d91624930bd11d0169d6a6337c33
  • Pryftan
    Pryftan over 4 years
    I didn't know grep had this option. Probably because of this: This is highly experimental and grep -P may warn of unimplemented features.; that's under CentOS 7. Under Fedora 29: This is experimental and grep -P may warn of unimplemented features. Of course in BSD grep it's not there at all. Would be nice if it wasn't so experimental but it's nice to be reminded of it - little though I'm likely to use it.
  • Pryftan
    Pryftan over 4 years
    @matt Of course you can check for $ (at the end of a pattern) to signify it's the end of the line - though that's not the same thing as helping you find multiple line patterns. See also glob(7). You might also find this website of interest: regular-expressions.info
  • Nuvious
    Nuvious over 3 years
    Thanks for this! Helped me filter some log files that needed a multi-line match.
  • JonTheNiceGuy
    JonTheNiceGuy over 2 years
    This worked for me, just the block I needed, on OS X.
  • Myridium
    Myridium over 2 years
    pcregrep: line 1 of file /dev/fd/63 is too long for the internal buffer when acting on a simple text file like <(cat file.txt | tr '\0' '\n').
  • rogerdpack
    rogerdpack over 2 years
    Works with grep -Pzo (though adds a trailing NUL char, see some of the other answers). grep -P is common in "linux" but not BSD...