There must be a better way to replace single newlines only?

2,755

Solution 1

You can use awk like this:

$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } ' test

Or if you need an extra newline at the end:

$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } END { print ""; } ' test

Or if you want to separate the paragraphs by a newline:

$ awk ' /^$/ { print "\n"; } /./ { printf("%s ", $0); } END { print ""; } ' test

These awk commands make use of actions that are guarded by patterns:

/regex/

or

END

A following action is only executed if the pattern matches the current line.

And the ^$. characters have special meaning in regular expressions, where ^ matches the beginning of line, $ the end and . an arbitrary character.

Solution 2

Use Awk or Perl's paragraph mode to process a file paragraph by paragraph, where paragraphs are separated by blank lines.

awk -vRS= '
  NR!=1 {print ""}      # print blank line before every record but the first
  {                     # do this for every record (i.e. paragraph):
    gsub(" *\n *"," "); # replace newlines by spaces, compressing spaces
    sub(" *$","");      # remove spaces at the end of the paragraph
    print
  }
'
perl -000 -pe '             # for every paragraph:
  print "\n" unless $.==1;  # print a blank line, except before the first paragraph
  s/ *\n *(?!$)/ /g;        # replace newlines by spaces, compressing spaces, but not at the end of the paragraph
  s/ *\n+\z/\n/             # normalize the last line end of the paragraph
'

Of course, since this doesn't parse the (La)TeX, it will horribly mutilate comments, verbatim environments and other special-syntax. You may want to look into DeTeX or other (La)TeX-to-text converters.

Solution 3

(reviving an ancient question)

This seems to be exactly what fmt and par are for - paragraph reformatting. Like you (and also like many programs) they define paragraph boundaries as one (or more) blank lines. Try piping your text through one of these.

fmt is a standard unix utility and can be found in GNU Coreutils.

par is a greatly-enhanced fmt written by Adam M. Costello which can be found at http://www.nicemice.net/par/ (it has also been packaged for several distributions, including debian - I packaged it for debian in Jan 1996, although there's a new maintainer for the pkg now.).

Solution 4

Sed Solution

$ sed -e ':a;N;$!ba;s/\(.\)\n/\1 /g' -e 's/\n/\n\n/' test.text

Note, that in this solution :a is creating a label and not using the a command.

Replacing Multiple Spaces

Use tr: $ tr -s ' ' <test.text

Solution 5

If I've understood correctly, an empty line implies two consecutive newlines, \n\n.

If so, one possible solution would be to eliminate all singular occurrences of newlines.

In Perl, a lookahead assertion is one way to achieve this:

$ perl -0777 -i -pe 's/\n(?=[^\n])//g' test
  • The -0777 flag effectively slurps the whole file into a single string
  • -p tells perl to print the string it's working on by default
  • -i specifies in-place editing
  • Global matching ensures that all single newline occurrences are dealt with
Share:
2,755

Related videos on Youtube

Jon
Author by

Jon

Updated on September 17, 2022

Comments

  • Jon
    Jon over 1 year

    I may have designed myself into a corner with this problem but I feel like there's a workable known solution to something like this that I'm not seeing. It may well be that I'm completely overcomplicating the problem and skipped over the obvious solution. Any advice would be greatly appreciated.

    I have a set of entities defined as interfaces. Each has a concrete implementation and a wrapper implementation. For example:

    -- Foo.java
    public interface Foo {
        public String getName();
    }
    
    -- FooImpl.java
    public class FooImpl implements Foo {
        private String name;
    
        public String getName() {
            return name;
        }
    }
    
    -- AbstractWrapper.java
    public class AbstractWrapper {
         protected String decorate(String text) {
             return "** " + text + " **";
         }
    
    -- FooWrapper.java
    public class FooWrapper extends AbstractWrapper implements Foo {
        private Foo wrappedFoo;
    
        public FooWrapper(Foo wrappedFoo) {
             this.wrappedFoo = wrappedFoo;
        }
    
        public String getName() {
             return decorate(wrappedFoo.getName());
        }
    }
    

    Now, the part that's complicating the situation is that I'm trying to make a List that automatically wraps the appropriate type with its appropriate wrapper before adding it. Something like:

    -- WrappingList.java
    
    /** T is the base entity interface type. */
    public class WrappingList<T> implements List<T> {
        private List<T> wrappedList;
    
        public WrappingList(List<T> wrappedList) {
            this.wrappedList = wrappedList;
        }
    
        ....
    
        public boolean add(T item) {
            return wrappedList.add(wrapItem(item));
        }
    
        protected T wrapItem(T item) {
             T wrappedItem = .....;
             return T;
        }
    }
    

    Is there anything I can do to make a somewhat clean factory method here? Or am I already out of luck at this point because of type erasure?

  • Jon
    Jon about 15 years
    I think this is the answer that was in the back of my head and bugging me that it should be possible. It definitely works here. Thanks!
  • Jon
    Jon about 15 years
    I agree in general -- this was actually a refactoring in the first place because I needed to make certain entities Observable in a subclass.
  • Ed Schwehm
    Ed Schwehm over 13 years
    This is good, although I'd prefer to keep the empty line between paragraphs. I assume you could do something like this by adding an extra new line somewhere in the first print command? Also, what is /./ doing: it seems to be acting like and else for the /^$/ string match, is that right?
  • Steven D
    Steven D over 13 years
    One problem this has is that there are no spaces between the sentences.
  • maxschlepzig
    maxschlepzig over 13 years
    @Seamus, sure - just replace the first print (updated the answer) - /./ matches all lines which are at least one character long, i.e. the complement of the /^$/ pattern which matches only empty lines.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 13 years
    The details of what auto-fill-mode does depend on what major mode you have active.
  • David Cary
    David Cary about 11 years
    +1 for good comments. I've seen too many programs with no comments at all.
  • Pablo A
    Pablo A almost 4 years
    fmt works great for short sentences, but hard-wrap long ones and doesn't have a "--width=infinite" option.
  • done
    done almost 4 years
    @PabloA Yes, maybe easier to read to new users of awk, but certainly less idiomatic (and quite longer to type). Use any that you like better. :-)