How to match words and ignore multiple spaces?

35,210

Solution 1

Use tr with its -s option to compress consecutive spaces into single spaces and then grep the result of that:

$ echo 'Some   spacious  string' | tr -s ' ' | grep 'Some spacious string'
Some spacious string

This would however not remove flanking spaces completely, only compress them into a single space at either end.

Using sed to remove the flanking blanks as well as compressing the internal blanks to single spaces:

echo ' Some   spacious  string' |
sed 's/^[[:blank:]]*//; s/[[:blank:]]*$//; s/[[:blank:]]\{1,\}/ /g'

This could then be passed through to grep.

Solution 2

Use Regex operator + to indicate one or more of the preceding token, space in this case. So the pattern would be \+:

echo "Ambari Server      running"  | grep -i "Ambari \+Server \+running"

I would suggest to use character class [:blank:] to match any horizontal whitespace, not just plain space, if you are unsure:

echo "Ambari Server      running"  | grep -i "Ambari[[:blank:]]\+Server[[:blank:]]\+running"

On the other hand, if you want to keep just one space between words, use awk:

echo "Ambari Server      running"  | \
    awk '$1=="Ambari" && $2=="Server" && $3=="running" {$1=$1; print}'
  • $1=="Ambari" && $2=="Server" && $3=="running" matches the desired three fields

  • {$1=$1} rebuilds the record with space as the new separator

  • {print} prints the record

Solution 3

If you just want to ignore all the space in between you can use echo your text |tr -d [[:space:]]| grep "yourtext" but the output will not have any space. Example:

echo "Hi This   Is Test" |tr -d [[:space:]] |grep HiThisIsTest

Output:

HiThisIsTest

Solution 4

To answer the main question of How to match words and ignore multiple spaces? Something like the following will help you get what you need:

echo "Ambari Server      running"  | tr '[:upper:]' '[:lower:]' | grep -E '\s*ambari\s+server\s+running\s*'

It takes the input and makes it lower case then searches for matches that are lower case. We use \s* for 0 or more whitespace (so will include tabs etc.) and \s+ for 1 or more whitespace.

If your input was in a file like foo2.txt below:

Ambari Server      running 
Ambari     Server running
     Ambari Server running

Then you could do something like:

cat foo2.txt | tr '[:upper:]' '[:lower:]' | grep -E '\s*ambari\s+server\s+running\s*'
ambari server      running
ambari     server running
     ambari server running

If you are just interested in the count, you can modify it a little to be like:

cat foo2.txt | tr '[:upper:]' '[:lower:]' | grep -E '\s*ambari\s+server\s+running\s*' | wc -l
Share:
35,210

Related videos on Youtube

yael
Author by

yael

Updated on September 18, 2022

Comments

  • yael
    yael over 1 year

    The following syntax should match the "Ambari Server running", but how to match in case there are multiple spaces between words? How to ignore spaces between words?

    echo "Ambari Server      running"  | grep -i "Ambari Server running"
    echo "Ambari     Server running"   | grep -i "Ambari Server running"
    echo "     Ambari Server running"  | grep -i "Ambari Server running"
    

    The expected results should be:

    Ambari Server running
    Ambari Server running
    Ambari Server running