Removing leading, trailing and multiple spaces within a string

15,800

Solution 1

You can use something like:

s/^\s+|\s+$|\s+(?=\s)//g

\s+(?=\s) will match all the spaces in the middle of the string and leave one.

Solution 2

In Javascript, the string prototype has two methods that can manage this:

str.trim().replace(/\s+/g, ' ')

str.trim() will remove leading and trailing spaces

str.replace(regex, replacement) will return a new string (nondestructive to original str) where regex will be compared against the provided string and the first instance of a match will be replaced by replacement, then the whole new string is returned.

Important thing to note: the first param of .replace should not be encapsulated with quotes. Regex is delimited with slashes (/regex/) and then g is appended to mean replace globally (every matched instance) rather than just replacing the first or next instance based on lastIndex (which is initially 0, giving the first instance). You can read more about lastIndex and everything I've mentioned at second link provided.

example:

var str = '  1 2  3   4  '
function trimReplace(str){
   newStr = str.trim().replace(/\s+/g, ' ');
   console.log(newStr);
}
trimReplace(str)

Try this in your console: ' 1 2 3 4 '.trim().replace(/\s+/g, ' ')

"1 2 3 4"

_

regex: kleene operators will help you understand the regex used to match multiple spaces

regex: helpful guide on regex and /g flag

Google: MDN string.protoype.trim()

Google: MDN string.prototype.replace()

Solution 3

Using awk

echo "   word1   word2 word3     word4  " | awk '{$1=$1}1'
word1 word2 word3 word4

This $1=$1 is a trick to concentrate everything.

You can even use

awk '$1=$1' file

But if first field is 0 or 0.0 it will fail

Solution 4

This might work for you (GNU sed):

sed -r 's/((^)\s*(\S))|((\S)\s*($))|(\s)\s*/\2\3\5\6\7/g' file

or simply:

sed -r 's/(^\s*(\S))|((\S)\s*$)|(\s)\s*/\2\4\5/g file
Share:
15,800
jkshah
Author by

jkshah

Updated on July 22, 2022

Comments

  • jkshah
    jkshah almost 2 years

    I would like to remove all leading and trailing spaces. As well as replace multiple spaces with a single space within a string, so that all words in a string are separated exactly by single space.

    I could achieve this using following two iteration of regex and looking for single regex solution.

    s/^\s+|\s+$//g
    s/\s+/ /g
    

    Sample Input:

       word1   word2 word3     word4    
    

    Desired Output:

    word1 word2 word3 word4
    

    It would be appreciable if you could help me to solve this.

  • jkshah
    jkshah over 10 years
    worked like charm. Thanks :) Still eager to see other approaches.
  • jkshah
    jkshah over 10 years
    Thanks for your response. I am not familiar with awk much but would like to try this one.
  • jkshah
    jkshah over 10 years
    Through it took me some time to understand, I got it working. Different capturing approach throwing out unnecessary ones!
  • iruvar
    iruvar over 10 years
    The sed guru strikes! ;-) +1
  • mpapec
    mpapec over 10 years
    does sed have problem with s/^\s+|\s+$|\s+(?=\s)//g?
  • potong
    potong over 10 years
    @mpapec the first two alternations are regexp's common to sed whereas the last is not.
  • Laurel
    Laurel almost 8 years
    Just a note, while this answer is otherwise good, there is a way to construct a regex without the delimiting slashes. developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…