How to backreference in Ruby regular expression (regex) with gsub when I use grouping?

18,580

Solution 1

You can backreference in the substitution string with \1 (to match capture group 1).

t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2") # => "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."

Solution 2

  • If you are using gsub(regex, replacement), then use '\1', '\2', ... to refer to the match. Make sure not to put double quotes around the replacement, or else escape the backslash as in Joshua's answer. The conversion from '\1' to the match will be done within gsub, not by literal interpretation.
  • If you are using gsub(regex){replacement}, then use $1, $1, ...

But for your case, it is easier not to use matches:

t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n")

Solution 3

If you got here because of Rubocop complaining "Avoid the use of Perl-style backrefs." about $1, $2, etc... you can can do this instead:

some_id = $1
# or
some_id = Regexp.last_match[1] if Regexp.last_match

some_id = $5
# or
some_id = Regexp.last_match[5] if Regexp.last_match

It'll also want you to do

%r{//}.match(some_string)

instead of

some_string[//]

Lame (Rubocop)

Share:
18,580
Konstantin
Author by

Konstantin

Updated on July 06, 2022

Comments

  • Konstantin
    Konstantin almost 2 years

    I would like to patch some text data extracted from web pages. sample:

    t="First sentence. Second sentence.Third sentence."
    

    There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document.

    I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex:

    t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)
    

    But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.