There must be a better way to replace single newlines only?
Solution 1
You can use awk like this:
$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } ' test
Or if you need an extra newline at the end:
$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } END { print ""; } ' test
Or if you want to separate the paragraphs by a newline:
$ awk ' /^$/ { print "\n"; } /./ { printf("%s ", $0); } END { print ""; } ' test
These awk commands make use of actions that are guarded by patterns:
/regex/
or
END
A following action is only executed if the pattern matches the current line.
And the ^$.
characters have special meaning in regular expressions, where ^
matches the beginning of line, $
the end and .
an arbitrary character.
Solution 2
Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.
awk -vRS= '
NR!=1 {print ""} # print blank line before every record but the first
{ # do this for every record (i.e. paragraph):
gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
sub(" *$",""); # remove spaces at the end of the paragraph
print
}
'
perl -000 -pe ' # for every paragraph:
print "\n" unless $.==1; # print a blank line, except before the first paragraph
s/ *\n *(?!$)/ /g; # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
s/ *\n+\z/\n/ # normalize the last line end of the paragraph
'
Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.
Solution 3
(reviving an ancient question)
This seems to be exactly what fmt
and par
are for - paragraph reformatting. Like you (and also like many programs) they define paragraph boundaries as one (or more) blank lines. Try piping your text through one of these.
fmt
is a standard unix utility and can be found in GNU Coreutils.
par
is a greatly-enhanced fmt
written by Adam M. Costello which can be found at http://www.nicemice.net/par/ (it has also been packaged for several distributions, including debian - I packaged it for debian in Jan 1996, although there's a new maintainer for the pkg now.).
Solution 4
Sed Solution
$ sed -e ':a;N;$!ba;s/\(.\)\n/\1 /g' -e 's/\n/\n\n/' test.text
Note, that in this solution :a
is creating a label and not using the a
command.
Replacing Multiple Spaces
Use tr
: $ tr -s ' ' <test.text
Solution 5
If I've understood correctly, an empty line implies two consecutive newlines, \n\n
.
If so, one possible solution would be to eliminate all singular occurrences of newlines.
In Perl, a lookahead assertion is one way to achieve this:
$ perl -0777 -i -pe 's/\n(?=[^\n])//g' test
- The
-0777
flag effectively slurps the whole file into a single string -p
tells perl to print the string it's working on by default-i
specifies in-place editing- Global matching ensures that all single newline occurrences are dealt with
Related videos on Youtube
Jon
Updated on September 17, 2022Comments
-
Jon over 1 year
I may have designed myself into a corner with this problem but I feel like there's a workable known solution to something like this that I'm not seeing. It may well be that I'm completely overcomplicating the problem and skipped over the obvious solution. Any advice would be greatly appreciated.
I have a set of entities defined as interfaces. Each has a concrete implementation and a wrapper implementation. For example:
-- Foo.java public interface Foo { public String getName(); } -- FooImpl.java public class FooImpl implements Foo { private String name; public String getName() { return name; } } -- AbstractWrapper.java public class AbstractWrapper { protected String decorate(String text) { return "** " + text + " **"; } -- FooWrapper.java public class FooWrapper extends AbstractWrapper implements Foo { private Foo wrappedFoo; public FooWrapper(Foo wrappedFoo) { this.wrappedFoo = wrappedFoo; } public String getName() { return decorate(wrappedFoo.getName()); } }
Now, the part that's complicating the situation is that I'm trying to make a List that automatically wraps the appropriate type with its appropriate wrapper before adding it. Something like:
-- WrappingList.java /** T is the base entity interface type. */ public class WrappingList<T> implements List<T> { private List<T> wrappedList; public WrappingList(List<T> wrappedList) { this.wrappedList = wrappedList; } .... public boolean add(T item) { return wrappedList.add(wrapItem(item)); } protected T wrapItem(T item) { T wrappedItem = .....; return T; } }
Is there anything I can do to make a somewhat clean factory method here? Or am I already out of luck at this point because of type erasure?
-
Jon about 15 yearsI think this is the answer that was in the back of my head and bugging me that it should be possible. It definitely works here. Thanks!
-
Jon about 15 yearsI agree in general -- this was actually a refactoring in the first place because I needed to make certain entities Observable in a subclass.
-
Ed Schwehm over 13 yearsThis is good, although I'd prefer to keep the empty line between paragraphs. I assume you could do something like this by adding an extra new line somewhere in the first print command? Also, what is
/./
doing: it seems to be acting like andelse
for the/^$/
string match, is that right? -
Steven D over 13 yearsOne problem this has is that there are no spaces between the sentences.
-
maxschlepzig over 13 years@Seamus, sure - just replace the first print (updated the answer) - /./ matches all lines which are at least one character long, i.e. the complement of the /^$/ pattern which matches only empty lines.
-
dmckee --- ex-moderator kitten almost 13 yearsThe details of what
auto-fill-mode
does depend on what major mode you have active. -
David Cary about 11 years+1 for good comments. I've seen too many programs with no comments at all.
-
Pablo A almost 4 years
fmt
works great for short sentences, but hard-wrap long ones and doesn't have a "--width=infinite" option. -
done almost 4 years@PabloA Yes, maybe easier to read to new users of awk, but certainly less idiomatic (and quite longer to type). Use any that you like better. :-)