Using Regex to remove Carriage Returns in a CSV file in Notepad++

19,987

Solution 1

After much hunting around StackOverflow I found a regex pattern suggested for a similar problem that I only had to modify slightly from single quote to double quote. I ran it in PERL. It worked great! Unfortunately I can't find the original post to give credit where credit is due.

Anyway, this is what I finally used. Thanks everyone for your help!

$string123 =~ s/((?:^[^"]*"|(?!^))[^"]*?(?:"[^"]*"[^"]*?)*?)(\n{1,})/$1/g; 

Solution 2

In case anyone else finds this, and wants a real legit answer for this problem in notepad++, consider:

Use the paragraph symbol/show invisible characters command to display CR and LF's.

Now, note that the true line feeds at the end of each csv record are usually (depends on what created the file) ONLY LFs. The real record separator DOES NOT HAVE A CR IN IT. And now notice that the carriage returns that are imbedded within a field and wrapped in quotes are usually CR/LFs (two non-print characters side-by-side, a CR and a LF)!

So now, IT'S EASY. Highlight a CRLF combination, pull up f&r, and your 'find what:' entry should be two empty boxes for the non-printable CRLF combination. Leave the replace field empty, and run it.

TADA! No more embedded carriage returns within quotes, and all for-real line-feeds are left intact.

Solution 3

Barry has the solution and it appears to be the best solution I can find anywhere, and I looked at length because I found, at first, that this solution wasn't working for me in Perl.

What I found is that a slight adjustment to that regex worked. I'm not sure the cause of this nuance, but Perl isn't able to find carriage returns when it looks for \n but does find carriage returns denoted in hex form /x0D.


So, instead of:

$string123 =~ s/((?:^[^"]"|(?!^))[^"]?(?:"[^"]"[^"]?)?)(\n{1,})/$1/g;


This worked for me:

$string123 =~ s/((?:^[^"]"|(?!^))[^"]?(?:"[^"]"[^"]?)?)(\x0D{1,})/$1/g;

Thanks Barry, big help!

Solution 4

I have encountered that problem and have prevailed using Notepad++. Shreyas answer is either wrong or outdated, because now using \r\n in regexp search and replace works. That said I used the following:

[^"]"(([^"]*)\r\n([^"]*))+"

The way it works, is that it matches:

[somethin0]"[somethin1]NEWLINE[somethin2]"

Where somethin1 and somethin2 are \2 and \3 (and \1 is the entire inner part) and somethin0 is delimiting character (comma most likely). To get what we want we substitute with:

[somethin0]"\2 \3"

And get the expected result! Well, mostly. Single replace like that removes single newline character within quotes. However, that should only be a minor inconvenience of spamming replaceAll button a few times (where "a few times" is the maximum amount of newlines present between quotes)

Share:
19,987
Barry
Author by

Barry

Updated on June 08, 2022

Comments

  • Barry
    Barry almost 2 years

    I have a CSV file I need to clean up. This is a one-time thing so I'd like to do it in Notepad++ if possible.

    The CSV file has two fields, one of which is wrapped in quotes. I'd like to remove any Carriage Returns from within the quoted field. I was trying to use this pattern but can't get it quite right...

    (.*)\"(.*)\n(.*)\"(.*)
    

    Also correct me if I am wrong, but I presume the "replace with" value would be something along the lines of:

    \1\2\3\4
    

    Thanks in advance.

    I'm also open to alternate solutions such as a quick and dirty PERL script.