sed pattern matching

35,512

Solution 1

First, these all also seem to work just fine:

sed 's/[ ]*  / /g'
sed 's/  [ ]*/ /g'
sed 's/ *  / /g'
sed 's/  * / /g'
sed 's/   */ /g'
sed 's/  \+/ /g'
sed 's/ \+ / /g'

Basically all it's doing is matching 2 spaces, plus any number of consecutive spaces. This works because regex is greedy by default, so "any number" is as many as it can find. (And [ ] is a "match any of these" with only a space character listed)

The particular syntax used in the question is ideal simply because you're dealing with spaces:

sed 's/ [ ]* / /g'

No two space characters are adjacent, so easy to see at a glance that there are 3 spaces, and less is likely to be interpreted as a typo.

Solution 2

sed 's/ [ ]* / /g'
\_/  | \____/ | |
 |   |    |   | \- g=globally (not just one occurence)
 |   |    |   |
 |   |    |   \- to
 |   |    |
 |   |    \- from
 |   |
 |   \- s=substitute
 |
 \- program sed

The from part:

/ [ ]* /
| \_/| 
|  | \- repeated 0-infinite times
|  |
|   \- group of characters
|
\- boundary

Including the *, there are 3 quantifiers:

  • 0 to infinity ? 0 or 1 times
  • 1 to infinity

They normally only refer to the last character, so x* matches x, xxxx and nothing. x? matches 0 or 1 x, + matches x, xx, xxx and so on. But it can match a group of characters like [aeiou]+ or a combination, encapsulated in parens: (foo)*. The first matches iiaiaei, the second foo and foofoo.

A group can be an enumeration [aeiou] or a from-to group: [a-z] or a combination: [0-9a-fA-F:]. If you like to include the minus in the group, you have to put it at the end or beginning: [-,:].

The most used command is probably 's' for substitute. Others are 'd' for delete and 'p' for print.

Patterns are encapsulated between delimiters, normally slash.

 sed 's/foo/bar/' 

Sed works line oriented. If you like to replace one (the first) foo with bar, above command is okay. To replace all, you need 'g' for globally.

 sed 's/foo/bar/g' 

Other ways to work with sed invoke line numbers:

 sed -n '1,5p' file 

-n will not print by default, 1,5p means: print from line 1 to 5.

 sed '6,$d' file 

This is equivalent. It will delete from line 6 to end.

 sed '5q' file

is again the same: quit after line 5.

Typically for sed is, that commands are more easy to write than to read.

Solution 3

The best sed instruction ever.

sed 's/ [ ]* / /g'

will replace all two or greater sequences of spaces into one space, therefore all words will be space delimited.

Share:
35,512

Related videos on Youtube

paulrehkugler
Author by

paulrehkugler

Updated on September 18, 2022

Comments

  • paulrehkugler
    paulrehkugler over 1 year

    I recently asked someone at work about how to take the output of ipcs -qa and make it space delimited, so I can parse it/store it in the database for monitoring. He gave me this:

    ipcs -qa | sed 's/ [ ]* / /g'
    

    It works, but why? How did he construct that pattern string? Where can I find documentation on how to construct them? I checked the man page, but it's pretty opaque.

    • Peter.O
      Peter.O about 12 years
      It may be pretty opaque, becaue it isn't a good way to reduce multiple spaces to a single space. It has unnecessary stuff in there. It only needs 's/ \+/ /g'
  • paulrehkugler
    paulrehkugler about 12 years
    Thanks for breaking this down into bite-size pieces - really helpful!
  • manatwork
    manatwork about 12 years
    Just for completeness, sed 's/ \{2,\}/ /g' also does the same thing.