How to use ruby gsub Regexp with many matches?
Solution 1
Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value:
csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV
puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"
The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9.
-
(?<!^|,)
— immediately preceding this spot there must not be either a start of line (^
) or a comma -
"
— find a double quote -
(?!,|$)
— immediately following this spot there must not be either a comma or end of line ($
)
As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \1
correctly in your replacement string.
For more information, see the section "Anchors" in the official Ruby regex documentation.
However, for the case where you do need to replace matches in your output, you can use any of the following:
"hello".gsub /([aeiou])/, '<\1>' #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>" #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" } #=> "h<e>ll<o>"
You can't use String interpolation in the replacement string, as you did:
"hello".gsub /([aeiou])/, "<#{$1}>"
#=> "h<previousmatch>ll<previousmatch>"
…because that string interpolation happens once, before the gsub
has been run. Using the block form of gsub
re-invokes the block for each match, at which point the global $1
has been appropriately populated and is available for use.
Edit: For Ruby 1.8 (why on earth are you using that?) you can use:
puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')
Solution 2
Assuming s
is a string, this will work:
puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")
Mahmoud Khaled
Updated on October 09, 2020Comments
-
Mahmoud Khaled over 3 years
I have csv file contents having double quotes inside quoted text
test,first,line,"you are a "kind" man",thanks again,second,li,"my "boss" is you",good
I need to replace every double quote not preceded or succeeded by a comma by ""
test,first,line,"you are a ""kind"" man",thanks again,second,li,"my ""boss"" is you",good
so " is replaced by ""
I tried
x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")
but didn't work
-
tadman over 12 yearsWhen you're using double quotes in the content, it's probably better to use single quotes to enquote them like
'\1""\2'
or use the third form%q[\1""\2]
-
David Grayson over 12 yearsI'm a little bit worried that my answer won't actually be adequate for your situation because it doesn't handle lots of things, like if there actually is a comma next to a quote in your data. You might need to do something more complicated that is not based on regex.
-
David Grayson over 12 yearsCool, I tried to figure out how to do negative lookbehind assertions in Ruby and couldn't figure it out.
-
Mahmoud Khaled about 12 yearsThanks Phrogz, it works great with ruby 1.9 only, can you advise an answer for ruby 1.8?
-
Phrogz about 12 years@MahmoudKhaled Updated to work with Ruby 1.8. (In the future, if you have a need for such an ancient version of Ruby, please include this in your question. Ruby 1.9.1—the first stable version of the 1.9 series—was released over three years ago.)
-
Phrogz over 9 years@Rivenfall Correct. That is what I said: "You can't use String interpolation in the replacement string". Did you mean to comment on the original question? Or did you misread this answer?
-
Rivenfall over 9 yearssorry I misread the answer. I'll add that \0 is the input string (but not $0)