sed pattern matching
Solution 1
First, these all also seem to work just fine:
sed 's/[ ]* / /g'
sed 's/ [ ]*/ /g'
sed 's/ * / /g'
sed 's/ * / /g'
sed 's/ */ /g'
sed 's/ \+/ /g'
sed 's/ \+ / /g'
Basically all it's doing is matching 2 spaces, plus any number of consecutive spaces. This works because regex is greedy by default, so "any number" is as many as it can find. (And [ ]
is a "match any of these" with only a space character listed)
The particular syntax used in the question is ideal simply because you're dealing with spaces:
sed 's/ [ ]* / /g'
No two space characters are adjacent, so easy to see at a glance that there are 3 spaces, and less is likely to be interpreted as a typo.
Solution 2
sed 's/ [ ]* / /g'
\_/ | \____/ | |
| | | | \- g=globally (not just one occurence)
| | | |
| | | \- to
| | |
| | \- from
| |
| \- s=substitute
|
\- program sed
The from part:
/ [ ]* /
| \_/|
| | \- repeated 0-infinite times
| |
| \- group of characters
|
\- boundary
Including the *, there are 3 quantifiers:
- 0 to infinity ? 0 or 1 times
- 1 to infinity
They normally only refer to the last character, so x* matches x, xxxx and nothing. x? matches 0 or 1 x, + matches x, xx, xxx and so on. But it can match a group of characters like [aeiou]+ or a combination, encapsulated in parens: (foo)*. The first matches iiaiaei, the second foo and foofoo.
A group can be an enumeration [aeiou] or a from-to group: [a-z] or a combination: [0-9a-fA-F:]. If you like to include the minus in the group, you have to put it at the end or beginning: [-,:].
The most used command is probably 's' for substitute. Others are 'd' for delete and 'p' for print.
Patterns are encapsulated between delimiters, normally slash.
sed 's/foo/bar/'
Sed works line oriented. If you like to replace one (the first) foo with bar, above command is okay. To replace all, you need 'g' for globally.
sed 's/foo/bar/g'
Other ways to work with sed invoke line numbers:
sed -n '1,5p' file
-n will not print by default, 1,5p means: print from line 1 to 5.
sed '6,$d' file
This is equivalent. It will delete from line 6 to end.
sed '5q' file
is again the same: quit after line 5.
Typically for sed is, that commands are more easy to write than to read.
Solution 3
The best sed instruction ever.
sed 's/ [ ]* / /g'
will replace all two or greater sequences of spaces into one space, therefore all words will be space delimited.
Related videos on Youtube
paulrehkugler
Updated on September 18, 2022Comments
-
paulrehkugler over 1 year
I recently asked someone at work about how to take the output of ipcs -qa and make it space delimited, so I can parse it/store it in the database for monitoring. He gave me this:
ipcs -qa | sed 's/ [ ]* / /g'
It works, but why? How did he construct that pattern string? Where can I find documentation on how to construct them? I checked the man page, but it's pretty opaque.
-
Peter.O about 12 yearsIt may be pretty opaque, becaue it isn't a good way to reduce multiple spaces to a single space. It has unnecessary stuff in there. It only needs
's/ \+/ /g'
-
-
paulrehkugler about 12 yearsThanks for breaking this down into bite-size pieces - really helpful!
-
manatwork about 12 yearsJust for completeness,
sed 's/ \{2,\}/ /g'
also does the same thing.