Regular expressions: How do I grab a block of text using regex? (in ruby)

17,173

Solution 1

There is a better way to allow the dot to match newlines (/m modifier):

regexp = /\{start_grab_entries\}(.*?)\{end_grab_entries\}/m

Also, make the * lazy by appending a ?, or you might match too much if more than one such section occurs in your input.

That said, the reason why you got a blank match is that you repeated the capturing group itself; therefore you only caught the last repetition (in this case, a \n).

It would have "worked" if you had put the capturing group outside of the repetition:

\{start_grab_entries\}((?:.|\n)*)\{end_grab_entries\}`

but, as said above, there is a better way to do that.

Solution 2

I'm adding this because often we're reading data from a file or data-stream where the range of lines we want are not all in memory at once. "Slurping" a file is discouraged if the data could exceed the available memory, something that easily happens in production corporate environments. This is how we'd grab lines between some boundary markers as the file is being scanned. It doesn't rely on regex, instead using Ruby's "flip-flop" .. operator:

#!/usr/bin/ruby

lines = []
DATA.each_line do |line|
  lines << line if (line['{start_grab_entries}'] .. line['{end_grab_entries}'])
end

puts lines          # << lines with boundary markers
puts
puts lines[1 .. -2] # << lines without boundary markers

__END__
this is not captured

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

this is not captured either

Output of this code would look like:

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

i want to grab
the text that
you see here in
the middle

Solution 3

string=<<EOF
blah
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
blah
EOF

puts string.scan(/{start_grab_entries}(.*?){end_grab_entries}/m)
Share:
17,173
sjsc
Author by

sjsc

Updated on June 12, 2022

Comments

  • sjsc
    sjsc almost 2 years

    I'm using ruby and I'm trying to find a way to grab text in between the {start_grab_entries} and {end_grab_entries} like so:

    {start_grab_entries}
    i want to grab
    the text that
    you see here in
    the middle
    {end_grab_entries}
    

    Something like so:

    $1 => "i want to grab
           the text that
           you see here in
           the middle"
    

    So far, I tried this as my regular expression:

    \{start_grab_entries}(.|\n)*\{end_grab_entries}
    

    However, using $1, that gives me a blank. Do you know what I can do to grab that block of text in between the tags correctly?

  • sjsc
    sjsc over 13 years
    Extremely appreciate the great response, Tim. That was great. Thank you so much! :)
  • sjsc
    sjsc over 13 years
    Thank you so much :) Really appreciate it!