Replacing multiple blank lines with one blank line using RegEx search and replace
Solution 1
Replacing
^(\s*\r\n){2,}
With
\r\n
Is what I ended up with.
This only selects blank lines in multiples of two or more and replaces them with one.
Solution 2
It depends what the line endings are. Assuming \n, replace this:
([ \t]*\n){3,}
with \n\n
.
Solution 3
Try this perl oneliner perl -00pe0
, if you want in place editing, just add -i
option
Solution 4
Replacing
\n\s*\n\s*
with
\n\n
should do the trick
Solution 5
For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.
And some words on what Alan Moore wrote in his answer:
UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r
(CR) and \n
(LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s)
which changes the value of the flag match_not_dot_newline
as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?
A Perl regular expression replace working for files with
- carriage return + line feed (DOS/Windows) or
- only line feed (Unix, Mac OS 10.0 and later versions) or
- only carriage return (Mac OS 9 and previous versions)
as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,}
and \1\1
as replace string.
Explanation:
\h*
matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.
The usage of \s
is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.
(\r?\n|\r)
... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.
(?:\h*\1)
... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1
, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.
{2,}
... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.
The replace string \1\1
references twice the first found line break.
The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.
{2,}
can be replaced by +
in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.
Art
Updated on April 01, 2020Comments
-
Art about 4 years
I have a file that I need to reformat and remove "extra" blank lines.
I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.
Here is a sample of the file I need to re-format.
All current text REPLACE with all the following: Winter 2011 Class Schedule Winter 2011 Class Registration Dates: Dec. 6, 2010 – Jan. 1, 2011 Winter 2011 Class Session Dates: Jan. 5 – Feb. 12, 2011 DANCE Adventures in Ballet & Tap 3 – 6 years Instructor: Ann Newby Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement. Saturdays 9 - 10 a.m. Jan. 8 – Feb. 12 Six-week fees: $30 African Storytelling 3 – 6 years Instructor: Ann Newby Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences. Saturdays 10 – 11 a.m. Jan. 8 – Feb. 12 Six-week fee: $30 African Dance / Children
You'll notice that some of the double blank lines have spaces or tabs or both in them.
After the search and replace has been run I should have a file that looks like this.
All current text REPLACE with all the following: Winter 2011 Class Schedule Winter 2011 Class Registration Dates: Dec. 6, 2010 – Jan. 1, 2011 Winter 2011 Class Session Dates: Jan. 5 – Feb. 12, 2011 DANCE Adventures in Ballet & Tap 3 – 6 years Instructor: Ann Newby Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement. Saturdays 9 - 10 a.m. Jan. 8 – Feb. 12 Six-week fees: $30 African Storytelling 3 – 6 years Instructor: Ann Newby Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences. Saturdays 10 – 11 a.m. Jan. 8 – Feb. 12 Six-week fee: $30 African Dance / Children
-
user470379 over 13 yearsThat would replace
DANCE\n\nAdventures...
withDANCE\nAdventures...
: ie no blank line in between. I think you either want to look for the beginning of line (^
usually, not sure for UltraEdit) or else match the expression 3 times or more. -
Art over 13 yearsThis works but started the selection at the end of the line previous to the blank lines and the file I am parsing is huge so every little bit cut from the process helps. Your answer is what led me to my solution though. Thank you.
-
Art over 13 yearsThis worked technically. But when I ran it line by line it also selects every single blank line and then puts it back.
-
Art over 13 yearsThis worked almost. I would only select 2 blank lines and wouldn't do 3 blanks or more. It also started selecting the end of the line before the blanks.
-
myhd over 11 yearsFound this answer via Google, using the following modified String for TextWrangler/BBEdit:
^(\s*\r){2,}
is replaced by\r
. Thanks! -
matt2000 about 10 yearsNo need for vim. You can do it on the commandline,as easily:
cat -s infile.txt > outfile.txt
-
Mofi about 6 yearsThis is an answer, but not for this question. This replace just searches for exactly two line-feeds and replace all found occurrences by one line-feed. The result for the posted text example is completely different to wanted output.
-
john c. j. about 6 yearsIt's a fantastic shame this answer have no upvotes since 2014.
-
Friedrich 'Fred' Clausen over 5 yearsI had to use
^(\s*\r?\n){2,}
- make the \r optional I think because my files only had \n.