What's the difference between \b and \< in the grep command

11,354

Solution 1

\< matches the beginning of a word
\> matches the end of a word
\b matches both boundaries if at the end or at the beginning

The important thing about those special characters is that they match an empty string and not the word boundary itself. a word boundary being the contrary of the the set of character represented by \w equivalent of [_[:alnum:]] (letter a to Z, digits and _) in Posix notation.

Example

Finally, Graeme find a very interesting example:

$ echo 'acegi   z' | grep -o '[acegi ]*\>' | cat -A
acegi$
$ echo 'acegi   z' | grep -o '[acegi ]*\b' | cat -A
acegi   $ 

Currently, this example shows that it can useful sometimes to match precisely the end of word instead of a word boundary because the use of matching space character is avoided by matching the end of word.
So in a more useful example, I would say that if you want to match non-word character and the end of this non-word, you can't use \>; but maybe \b can be used in this particular case because it will match the start of the next word.

So far no example manage to reach my mind. But in my opinion, there are probably some few use cases where it make sense, but my guess is that it's onlyfor readability purpose, Because when you put \b it's vague but if you precise start or end of the word then it gives a better understanding of the regexp to the persons who read it.

Solution 2

To answer the question on your title?

What's the difference between \b and \< ...

Almost none. Both match the boundary, the transition between a word and a non-word.

The only technical difference is:

  • The \b match the boundary on both start and end of a word.
  • The \< only match the start of a word.
  • The \> only match the end of a word.

The practical difference is:

$ echo ',,abc...' | grep -o '[abc.,]*'
,,abc...                                   # match the whole string

$ echo ',,abc...' | grep -o '[abc.,]*\b'
,,abc                                      # to the rightmost (due to *) word boundary.

$ echo ',,abc...' | grep -o '[abc.,]*\>'
,,abc                                      # match to the same point (in this case).

$ echo ',,abc...' | grep -o '[abc.,]*\<'   
,,                                         # match to the rightmost **start** of a word.

The same could be done with spaces (cat added to reveal the spaces):

Up to the rightmost "word boundary" (any) (note the spaces):

$ echo 'abcd     abcd    Z' | grep -o '[a-z ]*\b' | cat -A
abcd     abcd    $

Up to the rightmost "word start" (same point):

$ echo 'abcd     abcd    Z' | grep -o '[a-z ]*\<' | cat -A
abcd     abcd    $

Up to the rightmost "word end" (no trailing space):

$ echo 'abcd     abcd    Z' | grep -o '[a-z ]*\>' | cat -A
abcd     abcd$

Or, with sed:

Four word boundaries:

$ echo "abc %-= def." | sed 's/\b/ |>X<| /g'
 |>X<| abc |>X<|  %-=  |>X<| def |>X<| .

Two start of word:

$ echo "abc %-= def." | sed 's/\</ |>X<| /g'
 |>X<| abc %-=  |>X<| def.

And two end of word:

$ echo "abc %-= def." | sed 's/\>/ |>X<| /g'
abc |>X<|  %-= def |>X<| .

Reference

From GNU info sed:

'\b'
Matches a word boundary; that is it matches if the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa.

     $ echo "abc %-= def." | sed 's/\b/X/g'
     XabcX %-= XdefX.

Beginning

'<' Matches the beginning of a word.

     $ echo "abc %-= def." | sed 's/\</X/g'
     Xabc %-= Xdef.

End

'>' Matches the end of a word.

     $ echo "abc %-= def." | sed 's/\>/X/g'
     abcX %-= defX.
Share:
11,354

Related videos on Youtube

d_wheel
Author by

d_wheel

Updated on September 18, 2022

Comments

  • d_wheel
    d_wheel almost 2 years

    In the man page of grep, I see

    The symbols \< and \> respectively match the empty string at the beginning and  
    end of a word.  The symbol \b matches the  empty  string at  the  edge  of  a  word.
    

    But I still can't figure out the difference. To me, \b is Perl's notation for word boundary, while \< is Vim's notation for the same purpose.
    PS: English is not my native language. Pardon me if the difference is obvious to you.

    • Admin
      Admin over 10 years
      I can't think of any useful usage of \< or \> where you can't substitute \b. Obviously you can use \< and \> to create patterns that don't match anything, but that's not much use! So in practice it looks like there is no difference.
    • Admin
      Admin over 10 years
      I have to disagree I will prove it :-) I'll edit my answer to add an example
    • Admin
      Admin about 8 years
      \< predates vim (it was used in grep during the 1980s).
  • Graeme
    Graeme over 10 years
    But can you think of a situation where it is useful to use \< or \> instead of \b?
  • Graeme
    Graeme over 10 years
    @StephaneChazelas Ok, I see, so anything that matches a space or a word character can leave it ambiguous as to whether you are at the beginning or end of a word.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Note those are transitions from word to non-word characters, they have nothing to do with spaces (which is just one of many non-word characters).
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    Note that for three letter words, you'd have to write it \<\w{3}\> anyway, and that would be the same as \b\w{3}\b
  • Graeme
    Graeme over 10 years
    Yeah, there are always alternatives, but here is a good use. Consider a long, repeated character set containing a space, the \< or \> can be used to force the first or last occurrence to not have a space. Eg, [acegikm ]*\> is equivalent to [acegikm ]*[acegikm]. In most cases they are probably more use with sed than grep.
  • S edwards
    S edwards over 10 years
    @Graeme no it's not. come in the chat so we can discuss
  • S edwards
    S edwards over 10 years
    @StephaneChazelas what part you disagree with I don't understand ?
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    The part that starts with The important thing
  • anthony
    anthony over 5 years
    Using \< is good if you are matching words. But sometimes you definition of a word is not the same. For example you if are matching a filepath like /export. You really need to think carefully about what something does, before using any regular expression construct.
  • done
    done over 5 years
    Yes, those special characters ... match the word boundary itself, not as you state: those special characters ... match an empty string and not the word boundary itself. Try echo '...abc,,,' | sed 's/\b/_X_/g' for example (that didn't remove any characters, only add). Understand that word boundary is the transition, not any "real" character.