Seeking regex in Notepad++ to search and replace CRLF between two quotation marks ["] only
Solution 1
This will work using one regex only (tested in Notepad++):
Enter this regex in the Find what
field:
((?:^|\r\n)[^"]*+"[^\r\n"]*+)\r\n([^"]*+")
Enter this string in the Replace with
field:
$1 $2
Make sure the Wrap around
check box (and Regular expression
radio button) are selected.
Do a Replace All
as many times as required (until the "0 occurrences were replaced" dialog pops up).
Explanation:
(
(?:^|\r\n) Begin at start of file or before the CRLF before the start of a record
[^"]*+ Consume all chars up to the opening "
" Consume the opening "
[^\r\n"]*+ Consume all chars up to either the first CRLF or the closing "
) Save as capturing group 1 (= everything in record before the target CRLF)
\r\n Consume the target CRLF without capturing it
(
[^"]*+ Consume all chars up to the closing "
" Consume the closing "
) Save as capturing group 2 (= the rest of the string after the target CRLF)
Note: The *+ is a possessive quantifier. Use them appropriately to speed up execution.
Update:
This more general version of the regex will work with any line break sequence (\r\n
, \r
or \n
):
((?:^|[\r\n]+)[^"]*+"[^\r\n"]*+)[\r\n]+([^"]*+")
Solution 2
In this case the source data is generated by the export function in GMail for your contacts. After the modification outlined below (without RegEx) the result can be used to tidy up your contacts database and re-import it to GMail or to MS Outlook. Yes, I am standing on the shoulders of @alan and @robinCTS. Thank you both.
Instructions in 5 steps:
use Notepad++ / find replace / extended search mode / wrap around = on
-1- replace all [CRLF] with a unique set characters or a string (I used [~~
])
find: \r\n
and replace with: ~~
The file contents are now on one line only.
-2- Now we need to separate the header line. For this move to where the first record starts exactly before the 88th. comma (including the word after the 87th. comma [,]) and enter the [CRLF] manually by hitting the return key. There are two lines now: header and records.
-3- now find all [,~~
] and replace with [,\r\n
] The result is one record per line.
-4- remove the remaining [~~] find: ~~
and replace with: [ ] a space.
The file is now clean of unwanted [CRLF]s.
-5- Save the file and use it as intended.
snahl
Updated on June 14, 2022Comments
-
snahl about 2 years
I've got a CSV file with some 600 records where I need to replace some [CRLF] with a [space] but only when the [CRLF] is positioned between two ["] (quotation marks). When the second ["] is encountered then it should skip the rest of the line and go to the next line in the text.
I don't really have a starting point. Hope someone comes up with a suggestion.
Example:
John und Carol,,Smith,,,J.S.,,,,,,,,,,,,,+11 22 333 4444,,,,,"streetx 21[CRLF] New York City[CRLF] USA",streetx 21,,,,New York City,,,USA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,[CRLF]
In this case the two [CRLF] after the first ["] need to be replaced with a space [ ]. When the second ["] is encountered, skip the end of the line and go to next line.
Then again, now on the next line, after the first ["] is encountered replace all [CRLF] until the second ["] is encountered. The [CRLF]s vary in numbers. In the CSV-file the amount of commas [,] before (23) and after (65) the 2 quotation marks ["] is constant.
So maybe a comma counter could be used. I don't know.
Thanks for feedback.
-
snahl over 11 yearsThanks. In step 2 NP++ responds saying: can't find the text: "((?:[^,]*?,){88})~" I assume there is a syntax problem. Regex is On, wrap around is Off.
-
snahl over 11 yearsThanks. Same here: NP++ responds saying: can't find the text: "((?:^|\r\n)[^"]*+"[^\r\n"]*+)\r\n([^"]*+")". Now I assume it isn't a syntax problem anymore. Rather a NP++ setup issue. I am lost; NP++ is updated.
-
robinCTS over 11 years@snahl - You need
Wrap around
ON :-) -
robinCTS over 11 years@snahl - You need
Wrap around
ON :-) -
snahl over 11 yearsWrap around is now turned ON (as per instructions). Result: "0 occurrences were found" (after first run). Nothing has changed in the CSV-file.
-
robinCTS over 11 years@snahl - My bad I think! Try ((?:^|[\r\n]+)[^"]*+"[^\r\n"]*+)[\r\n]+([^"]*+"). I hardwired the CRLF for Windows rather than a generic version to cater for *nix/OS X style as well. Make sure
Wrap around
is ticked,Regular expression
is selected and you pressReplace All
. If this still fails, try copying and pasting your posted example into a newly created file in NotePad++, and then running the replace on it. Let me know what happens. I'm running v6.2.3 What version do you have? -
snahl over 11 yearssorry still NoGo, same result: Result: "0 occurrences were found" Meanwhile I got it done similar as @dda suggested. I will post the my solution below. Nevertheless I am still curious how this could be done in one step.
-
robinCTS over 11 yearsSeems like this "header" might be the culprit. I forgot to mention that on the example you posted, I got both my and alan's version working in NP++. My regex also worked in the EditPad Pro program. If you could add the header and the first couple of records to your original question (with any sensitive material obsfucated, of course), I'm sure we could get a 1 step solution going.
-
robinCTS over 11 years@snahl - Are you using NotePad++ v6.2.3? Because this version gives the result "0 occurrences were replaced." when a replace fails.
-
robinCTS over 11 yearsJust tried your solution, skipping step 2, of course. Worked perfectly. No surprise there! Are you saying if you skip step 2 it doesn't work?
-
snahl over 11 years@robinCTS No I don't say that. I just separated the header from the records because the header has 87 commas while the records have 88 commas each. But as you say this is not really necessary. Anyway, how is your regex doing? I would really like to use that one-liner. Are you still working on it? Do you still need sample data or can you try with your own from GMail contacts export?
-
robinCTS over 11 yearsI'm still working on it. I've just upgrade my Google account to GMail. I'll have a go with that first.
-
robinCTS over 11 yearsOK. Created 2 dummy GMail contacts; exported the contacts; did Replace All 3 times. Worked perfectly! I have a plan, though. [Step 1] Download the CSV file from here. Try Replace All with my Regex on that. Let me know if it works. Should we continue in chat?
-
alan over 11 yearssnahl. Glad you have a solution working. If @robinCTS answer helped you, you may want to upvote it. Cheers.
-
snahl over 11 years@alan both of you helped me
-
snahl over 11 years@robinCTS: I downloaded the file and applied you regex as per instructions. Result: "0 occurrences were found". I am puzzled(?). You say: 3 times, but there are only 2 records. Did you try with a file downloaded from within your GMail? Chat would be an option but I am probably in a different timezone (GMT+1). Suggest we go on as is.
-
snahl over 11 years@robinCTS: Also with this regex: ((?:[^,]*?,){88})~ I get the result: "0 occurrences were found". Maybe the cause IS some issue with my notepad++ settings. Though other regex strings work fine. Any idea?
-
robinCTS over 11 yearsGMT+8 here. Not planning on going to bed anytime soon :) Moving to this room.
-
alan over 11 years@snahl I don't have NP++ so can't test it there, but I did test it on your test data and it works in EditPad Pro. The only thing I can guess is make sure that you have run Step 1 first so that there are some
~
characters. If all the EOL are replace by~
and 88 is the correct number, then I don't know why it wouldn't work in NP++ -
robinCTS over 11 yearsI'm pretty sure it's a problem with snahl's NP++. I tested your version and mine and they both work in my NP++ and my EditPad Pro. I even sent @snahl a copy of a test file, and that didn't work. I've moved the discussion to chat to help him sort the problem out.
-
alan over 11 years@robinCTS ok. cool. glad it worked for someone besides me ;-) Cheers.
-
snahl over 11 yearsI thought I was sure after updating NP++. I double checked it was v5.9.x. So I initiated another update, now on 6.2.2 et volia your regex works in 2 runs.
-
snahl over 11 yearsBefore I downloaded a trial of EditPad Pro and your regex worked flawlessly in 1 run. Thank you very much. With this we can easily edit the GMail contact export google.csv and contacts.csv files. very USEFUL