How to remove all CRLF in file (not replace with LF)

files text-processing sed newlines

7,702

Solution 1

sed ":a;/\r$/{N;s/\r\n//;b a}"

This will match all lines that have '\r' at the end (followed by '\n'). On these lines it will first append the next line of input (while putting the '\n separator in between), then replace the resulting "\r\n" with an empty string, and then goes back to the beginning to see, whether the new contents of pattern space doesn't by chance happen to match again.

Following the comment: if you wanted to strip any additional '\r' from the file as well, just add it after stripping the CRLF combos:

sed ":a;/\r$/{$!N;s/\r\n//;t a};s/\r//g"

Solution 2

I tend to reach for perl one-liners when doing anything that involves manipulating line endings:

perl -pe 'BEGIN {undef $/} s/\r\n//g' *.txt

The key to making this work is the undef $/, which makes Perl read each file as one string, which you can then do a search-and-replace on. To strip bare \r as well, just tweak the regex:

perl -pe 'BEGIN {undef $/} s/\r\n?//g' *.txt

7,702

user779159

Updated on September 18, 2022

Comments

user779159 over 1 year

I'd like to remove all carriage returns followed by line feeds (CRLF), such as \r\n in a file. How can I do that? I can't use dos2unix because that replaces CRLF with LF. And I can't use tr because that will also replace any \n that aren't preceded by \r. How can I do this?
- user779159 over 9 years
  
  I tried sed -i 's/\r\n//g' file which didn't work
user779159 over 9 years

Cool! Btw is there a way to modify that command to strip all occurrences of \r in addition to \r\n from the file? (Rather than having to run a second command to get rid of \r using something like tr.)
user779159 over 9 years

Thanks peterph, it works great. (And mikeserv for an edit.) Putting 2 sed commands in the same command separated by a semicolon is more efficient than running them as 2 separate commands? Meaning it only has to scan the file through once and just runs both commands on each line?
Avinash Raj over 9 years

perl -pe 'BEGIN {undef $/} s/\r(?=\n)//g' *.txt
peterph over 9 years

Yes. Plus you save a couple of miliseconds on creating a new process.
peterph over 9 years

@mikeserv thanks for the edit. However the branch command needs modifying as well - the unconditional one I had there was causing an endless loop on the last line.
mikeserv over 9 years

Maybe like sed -e :n -e '/^M$/{$s///p;N' -e '};s/.\n//;tn' The problem is though that sed isnt designed for unlimited line length. If youre relying on gnu extensions -z is probably easiest: sed -z 's/\r\n//g'. But then youre working with long pattern spaces. At lest that way though you can clear it without printing a newline - so once per edit. And sed probably doesnt save much here - im willing to bet an additional tr would actually save processing time.
zwol over 9 years

@AvinashRaj That is the same as dos2unix, which is specifically not what the OP wanted.