Regular expressions: How do I grab a block of text using regex? (in ruby)
Solution 1
There is a better way to allow the dot to match newlines (/m
modifier):
regexp = /\{start_grab_entries\}(.*?)\{end_grab_entries\}/m
Also, make the *
lazy by appending a ?
, or you might match too much if more than one such section occurs in your input.
That said, the reason why you got a blank match is that you repeated the capturing group itself; therefore you only caught the last repetition (in this case, a \n
).
It would have "worked" if you had put the capturing group outside of the repetition:
\{start_grab_entries\}((?:.|\n)*)\{end_grab_entries\}`
but, as said above, there is a better way to do that.
Solution 2
I'm adding this because often we're reading data from a file or data-stream where the range of lines we want are not all in memory at once. "Slurping" a file is discouraged if the data could exceed the available memory, something that easily happens in production corporate environments. This is how we'd grab lines between some boundary markers as the file is being scanned. It doesn't rely on regex, instead using Ruby's "flip-flop" ..
operator:
#!/usr/bin/ruby
lines = []
DATA.each_line do |line|
lines << line if (line['{start_grab_entries}'] .. line['{end_grab_entries}'])
end
puts lines # << lines with boundary markers
puts
puts lines[1 .. -2] # << lines without boundary markers
__END__
this is not captured
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
this is not captured either
Output of this code would look like:
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
i want to grab
the text that
you see here in
the middle
Solution 3
string=<<EOF
blah
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
blah
EOF
puts string.scan(/{start_grab_entries}(.*?){end_grab_entries}/m)
sjsc
Updated on June 12, 2022Comments
-
sjsc almost 2 years
I'm using ruby and I'm trying to find a way to grab text in between the {start_grab_entries} and {end_grab_entries} like so:
{start_grab_entries} i want to grab the text that you see here in the middle {end_grab_entries}
Something like so:
$1 => "i want to grab the text that you see here in the middle"
So far, I tried this as my regular expression:
\{start_grab_entries}(.|\n)*\{end_grab_entries}
However, using $1, that gives me a blank. Do you know what I can do to grab that block of text in between the tags correctly?
-
sjsc over 13 yearsExtremely appreciate the great response, Tim. That was great. Thank you so much! :)
-
sjsc over 13 yearsThank you so much :) Really appreciate it!