Replacing multiple blank lines with one blank line using RegEx search and replace

25,856

Solution 1

Replacing

^(\s*\r\n){2,}

With

\r\n

Is what I ended up with.

This only selects blank lines in multiples of two or more and replaces them with one.

Solution 2

It depends what the line endings are. Assuming \n, replace this:

([ \t]*\n){3,}

with \n\n.

Solution 3

Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option

Solution 4

Replacing

\n\s*\n\s* 

with

\n\n

should do the trick

Solution 5

For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

And some words on what Alan Moore wrote in his answer:

UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

A Perl regular expression replace working for files with

  • carriage return + line feed (DOS/Windows) or
  • only line feed (Unix, Mac OS 10.0 and later versions) or
  • only carriage return (Mac OS 9 and previous versions)

as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

Explanation:

\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

The replace string \1\1 references twice the first found line break.

The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

Share:
25,856
Art
Author by

Art

Updated on April 01, 2020

Comments

  • Art
    Art about 4 years

    I have a file that I need to reformat and remove "extra" blank lines.

    I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.

    Here is a sample of the file I need to re-format.

    All current text
    
    REPLACE with all the following:
    
    
    Winter 2011 Class Schedule 
    
    Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
    Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011
    
    DANCE
    
    Adventures in Ballet & Tap      
    3 – 6 years Instructor:  Ann Newby
    Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
    Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 
    
    
    African Storytelling
    3 – 6 years Instructor:  Ann Newby
    Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
    Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30
    
    
    African Dance / Children
    

    You'll notice that some of the double blank lines have spaces or tabs or both in them.

    After the search and replace has been run I should have a file that looks like this.

    All current text
    
    REPLACE with all the following:
    
    Winter 2011 Class Schedule 
    
    Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
    Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011
    
    DANCE
    
    Adventures in Ballet & Tap      
    3 – 6 years Instructor:  Ann Newby
    Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
    Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 
    
    African Storytelling
    3 – 6 years Instructor:  Ann Newby
    Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
    Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30
    
    African Dance / Children
    
  • user470379
    user470379 over 13 years
    That would replace DANCE\n\nAdventures... with DANCE\nAdventures...: ie no blank line in between. I think you either want to look for the beginning of line (^ usually, not sure for UltraEdit) or else match the expression 3 times or more.
  • Art
    Art over 13 years
    This works but started the selection at the end of the line previous to the blank lines and the file I am parsing is huge so every little bit cut from the process helps. Your answer is what led me to my solution though. Thank you.
  • Art
    Art over 13 years
    This worked technically. But when I ran it line by line it also selects every single blank line and then puts it back.
  • Art
    Art over 13 years
    This worked almost. I would only select 2 blank lines and wouldn't do 3 blanks or more. It also started selecting the end of the line before the blanks.
  • myhd
    myhd over 11 years
    Found this answer via Google, using the following modified String for TextWrangler/BBEdit: ^(\s*\r){2,} is replaced by \r. Thanks!
  • matt2000
    matt2000 about 10 years
    No need for vim. You can do it on the commandline,as easily: cat -s infile.txt > outfile.txt
  • Mofi
    Mofi about 6 years
    This is an answer, but not for this question. This replace just searches for exactly two line-feeds and replace all found occurrences by one line-feed. The result for the posted text example is completely different to wanted output.
  • john c. j.
    john c. j. about 6 years
    It's a fantastic shame this answer have no upvotes since 2014.
  • Friedrich 'Fred' Clausen
    Friedrich 'Fred' Clausen over 5 years
    I had to use ^(\s*\r?\n){2,} - make the \r optional I think because my files only had \n.