Get all regex matches between two patterns and print them to file

5,589

IF GNU grep is an option, you could pass the -P (perl-compatible regex) flag and use lookahead assertions, lookbehind assertions and non-greedy matches to pull out what you need

echo 'xxSTART relevanttext xxEND something else xxSTART even more relevant'  |\
grep -oP '(?<=START).*?(?=xxEND|$)'
relevanttext
even more relevant

Or as Stephane Chazelas suggests, use the nifty \K in place of the look-behind assertion

echo 'xxSTART relevanttext xxEND something else xxSTART even more relevant'  |\
grep -oP 'START\K.*?(?=xxEND|$)' 
Share:
5,589

Related videos on Youtube

user48020
Author by

user48020

Updated on September 18, 2022

Comments

  • user48020
    user48020 over 1 year

    I've got a file with a bunch of long lines. I'd like to grab every group between two patterns and print them to a new file, one match per line. I could manage to do this with Python, but I'd prefer using just command line tools for this task. If there is no end pattern, I'd like to grab everything 'till the end of the line.

    Something like:

    input: 
    xxSTART relevanttext xxEND something else xxSTART even more relevant
    
    output:
    relevanttext
    even more relevant
    
    • Admin
      Admin over 10 years
      So START and END both are within the same long line?
    • Admin
      Admin over 10 years
      Yes! I used to have just one match per line, so I'd use sed to grab everything after xxSTART, but now the input data changed and I'm a bit stumped.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Or: grep -oP 'START\K.*?(?=xxEND|$)'
  • Mathias Begert
    Mathias Begert over 10 years
    @StephaneChazelas, that's a good point, added in. My version of GNU grep (2.5.1) doesn't support \K though