What's the difference between \b and \< in the grep command
Solution 1
\<
matches the beginning of a word
\>
matches the end of a word
\b
matches both boundaries if at the end or at the beginning
The important thing about those special characters is that they match an empty string and not the word boundary itself. a word boundary being the contrary of the the set of character represented by \w
equivalent of [_[:alnum:]]
(letter a to Z, digits and _
) in Posix notation.
Example
Finally, Graeme find a very interesting example:
$ echo 'acegi z' | grep -o '[acegi ]*\>' | cat -A
acegi$
$ echo 'acegi z' | grep -o '[acegi ]*\b' | cat -A
acegi $
Currently, this example shows that it can useful sometimes to match precisely the end of word instead of a word boundary because the use of matching space character is avoided by matching the end of word.
So in a more useful example, I would say that if you want to match non-word character and the end of this non-word, you can't use \>
; but maybe \b
can be used in this particular case because it will match the start of the next word.
So far no example manage to reach my mind.
But in my opinion, there are probably some few use cases where it make sense, but my guess is that it's onlyfor readability purpose, Because when you put \b
it's vague but if you precise start or end of the word then it gives a better understanding of the regexp to the persons who read it.
Solution 2
To answer the question on your title?
What's the difference between
\b
and\<
...
Almost none. Both match the boundary, the transition between a word and a non-word.
The only technical difference is:
- The
\b
match the boundary on both start and end of a word. - The
\<
only match the start of a word. - The
\>
only match the end of a word.
The practical difference is:
$ echo ',,abc...' | grep -o '[abc.,]*'
,,abc... # match the whole string
$ echo ',,abc...' | grep -o '[abc.,]*\b'
,,abc # to the rightmost (due to *) word boundary.
$ echo ',,abc...' | grep -o '[abc.,]*\>'
,,abc # match to the same point (in this case).
$ echo ',,abc...' | grep -o '[abc.,]*\<'
,, # match to the rightmost **start** of a word.
The same could be done with spaces (cat added to reveal the spaces):
Up to the rightmost "word boundary" (any) (note the spaces):
$ echo 'abcd abcd Z' | grep -o '[a-z ]*\b' | cat -A
abcd abcd $
Up to the rightmost "word start" (same point):
$ echo 'abcd abcd Z' | grep -o '[a-z ]*\<' | cat -A
abcd abcd $
Up to the rightmost "word end" (no trailing space):
$ echo 'abcd abcd Z' | grep -o '[a-z ]*\>' | cat -A
abcd abcd$
Or, with sed:
Four word boundaries:
$ echo "abc %-= def." | sed 's/\b/ |>X<| /g'
|>X<| abc |>X<| %-= |>X<| def |>X<| .
Two start of word:
$ echo "abc %-= def." | sed 's/\</ |>X<| /g'
|>X<| abc %-= |>X<| def.
And two end of word:
$ echo "abc %-= def." | sed 's/\>/ |>X<| /g'
abc |>X<| %-= def |>X<| .
Reference
From GNU info sed:
'\b'
Matches a word boundary; that is it matches if the character to the left is a "word" character and the character to the right is a "non-word" character, or vice-versa.
$ echo "abc %-= def." | sed 's/\b/X/g' XabcX %-= XdefX.
Beginning
'<' Matches the beginning of a word.
$ echo "abc %-= def." | sed 's/\</X/g' Xabc %-= Xdef.
End
'>' Matches the end of a word.
$ echo "abc %-= def." | sed 's/\>/X/g' abcX %-= defX.
Related videos on Youtube
d_wheel
Updated on September 18, 2022Comments
-
d_wheel almost 2 years
In the man page of
grep
, I seeThe symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word.
But I still can't figure out the difference. To me,
\b
is Perl's notation for word boundary, while\<
is Vim's notation for the same purpose.
PS: English is not my native language. Pardon me if the difference is obvious to you.-
Admin over 10 yearsI can't think of any useful usage of
\<
or\>
where you can't substitute\b
. Obviously you can use\<
and\>
to create patterns that don't match anything, but that's not much use! So in practice it looks like there is no difference. -
Admin over 10 yearsI have to disagree I will prove it :-) I'll edit my answer to add an example
-
Admin about 8 years
\<
predates vim (it was used in grep during the 1980s).
-
-
Graeme over 10 yearsBut can you think of a situation where it is useful to use
\<
or\>
instead of\b
? -
Graeme over 10 years@StephaneChazelas Ok, I see, so anything that matches a space or a word character can leave it ambiguous as to whether you are at the beginning or end of a word.
-
Stéphane Chazelas over 10 yearsNote those are transitions from word to non-word characters, they have nothing to do with spaces (which is just one of many non-word characters).
-
Stéphane Chazelas over 10 yearsNote that for three letter words, you'd have to write it
\<\w{3}\>
anyway, and that would be the same as\b\w{3}\b
-
Graeme over 10 yearsYeah, there are always alternatives, but here is a good use. Consider a long, repeated character set containing a space, the
\<
or\>
can be used to force the first or last occurrence to not have a space. Eg,[acegikm ]*\>
is equivalent to[acegikm ]*[acegikm]
. In most cases they are probably more use withsed
thangrep
. -
S edwards over 10 years@Graeme no it's not. come in the chat so we can discuss
-
S edwards over 10 years@StephaneChazelas what part you disagree with I don't understand ?
-
Stéphane Chazelas over 10 yearsThe part that starts with The important thing
-
anthony over 5 yearsUsing \< is good if you are matching words. But sometimes you definition of a word is not the same. For example you if are matching a filepath like /export. You really need to think carefully about what something does, before using any regular expression construct.
-
done over 5 yearsYes, those special characters ... match the word boundary itself, not as you state: those special characters ... match an empty string and not the word boundary itself. Try
echo '...abc,,,' | sed 's/\b/_X_/g'
for example (that didn't remove any characters, only add). Understand that word boundary is the transition, not any "real" character.