How to match whitespace in sed?

440,638

Solution 1

The character class \s will match the whitespace characters <tab> and <space>.

For example:

$ sed -e "s/\s\{3,\}/  /g" inputFile

will substitute every sequence of at least 3 whitespaces with two spaces.


REMARK: For POSIX compliance, use the character class [[:space:]] instead of \s, since the latter is a GNU sed extension. See the POSIX specifications for sed and BREs

Solution 2

This works on MacOS 10.8:

sed -E "s/[[:space:]]+/ /g"

Solution 3

sed 's/[ \t]*/"space or tab"/'

Solution 4

Some older versions of sed may not recognize \s as a white space matching token. In that case you can match a sequence of one or more spaces and tabs with '[XZ][XZ]*' where X is a space and Z is a tab.

Share:
440,638

Related videos on Youtube

Maksim Kondratyuk
Author by

Maksim Kondratyuk

Currently working as Doctoral Student in the Speech Group of the Department of Signal Processing and Acoustics of the Aalto Univerity School of Electrical Engineering (formerly TKK / Helsinki University of Technology) in Helsinki, Finland.

Updated on September 17, 2022

Comments

  • Maksim Kondratyuk
    Maksim Kondratyuk over 1 year

    How can I match whitespace in sed? In my data I want to match all of 3+ subsequent whitespace characters (tab space) and replace them by 2 spaces. How can this be done?

  • Marnix A.  van Ammers
    Marnix A. van Ammers about 14 years
    So for the particular need here, with an older sed, you could do: $ sed 's/[XZ][XZ][XZ][XZ]*/ /g' inputfile where X is a tab and Z is a space.
  • DeboraThaise
    DeboraThaise over 12 years
    aha! It was the missing -e switch that got me.
  • HUB
    HUB about 12 years
    I also had to add '-r' switch which enables extended regex's to make sed recognize '\s' as space.
  • Jared Beck
    Jared Beck almost 11 years
    With Apple's sed I had to use [[:space:]] because \s did not work for me. Perhaps \s is a GNU sed extension?
  • Karthik T
    Karthik T over 10 years
    @JaredBeck thanks, was running out of ideas why my simple regex wasnt working.. This is lame, I thought \s was standard extended regex.. Also -r doesnt work and -E did squat
  • bpa
    bpa over 10 years
    Thanks for the feedback. I updated the answer with links to the POSIX standard.
  • amphibient
    amphibient over 10 years
    do you know if this works on all Linux distros ?
  • Brad Koch
    Brad Koch about 10 years
    Not generally, GNU sed won't have -E. From the BSD sed man page: "The -E, -a and -i options are non-standard FreeBSD extensions and may not be available on other operating systems."
  • Mokubai
    Mokubai almost 10 years
    Is this guaranteed to work on any version of sed on any system? If not it might be worth mentioning where this does work in a similar fashion as the other answers, just so we know the limitations and where this might not have the intended result.
  • Darren Cook
    Darren Cook almost 10 years
    For me -e stopped it working, but -r made it work (Mint 16). I.e. changing from sed -e -r to sed -r was what I needed to do. However I was using [[:space:]] by this point, as I couldn't get \s to work.
  • Nate
    Nate over 9 years
    This RE is what I use to match whitespace. It is simpler than character classes just to match tab or space. It uses only the most basic conventions of regular expressions, so it should work anywhere with a functional implementation of regular expressions.
  • Samuel
    Samuel about 9 years
    Why do you need the -E flag, for the + operator? Most expressions would probably be fine with * instead, then this would work on other platforms.
  • Alien Life Form
    Alien Life Form almost 9 years
    On Mac 10.9.5 this matches for spaces and 't'. I used Michael Douma's above to match whitespace chars (it also works with -e).
  • Mancika
    Mancika over 8 years
    @Samuel If you use *, the regex will match zero or more spaces, and you will get a space between every character, and a space at each end of each line. If you don't have the -E flag, then you want sed "s/[[:space:]]\+/ /g" to match one or more spaces.
  • Mancika
    Mancika over 8 years
    Doesn't work sensibly on my SUSE system. It matches the first place on the line where there is zero or more spaces, which is before the first character. I doubt that is the intended function, and certainly wasn't the requested use case. I believe you want to change the '*' for '\+' (or '\{3,\}' per the question) and maybe put a g at the end of the sed command to match all occurrences of the pattern. Replacing [ \t] with [[:space:]] may also be desirable as well, in case there is something else for whitespace in the line.
  • Witiko
    Witiko over 7 years
    Much like the POSIX [:space:] character class, \s will not only match <tab> and <space>, but also the <newline> character (try sed 'N;s/\s/x/' <<<$'aaa\nbbb' in bash).
  • jarno
    jarno over 7 years
    GNU sed manual does not list \s as a GNU extension.
  • stefanct
    stefanct over 6 years
    Instead of [[:space:] one could use [[:blank:]] which does match horizontal tabs and spaces only (but no newlines, vertical tabs etc.).
  • mcandre
    mcandre over 6 years
    FWIW, NetBSD's sed supports the -E flag as well.
  • xuhdev
    xuhdev about 6 years
    @BradKoch The fact that -E is non-standard does not imply GNU sed does not have that option. You linked document exactly states the availability of -E option for GNU sed as well.
  • Brad Koch
    Brad Koch about 6 years
    @xuhdev You're correct, GNU sed added support for -E in version 4.3, released in 2017. Older versions will still fail with -E.
  • xuhdev
    xuhdev about 6 years
    @BradKoch OK, I think I know what is confusing. Older versions already support -E but it is not documented. It was documented later since it seems that -E is coming to POSIX standard. See unix.stackexchange.com/a/310454/38242
  • bobpaul
    bobpaul about 5 years
    For curious readers: GNU sed has had -r since as long as I can remember (prior to 2004 switch to git). -E was added as an undocumented alias to -r in Aug 2006 (rev 3a8e165). They documented -E in Oct 2013 (rev 8b65e079, prior to v4.1; they didn't git tag prior releases). All v4.3 added w/re to -E was examples in the HTML documentation. Regardless, any GNU sed running in 2010 shouldn't have had any problems with -E, but it was undocumented at the time... git://git.sv.gnu.org/sed
  • NeilG
    NeilG almost 5 years
    On my platforms -e is optional
  • Jerry Green
    Jerry Green over 3 years
    doesn't work on my macos Catalina
  • NYCeyes
    NYCeyes almost 3 years
    But how do you specify \s in the destination part (i.e. the replace-with) part of the regular expression? I want to avoid using keyboard spaces and/or tabs there, as well.