Bash test: what does "=~" do?
Solution 1
The ~
is actually part of the operator =~
which performs a regular expression match of the string to its left to the extended regular expression on its right.
[[ "string" =~ pattern ]]
Note that the string should be quoted, and that the regular expression shouldn't be quoted.
A similar operator is used in the Perl programming language.
The regular expressions understood by bash
are the same as those that GNU grep
understands with the -E
flag, i.e. the extended set of regular expressions.
Somewhat off-topic, but good to know:
When matching against a regular expression containing capturing groups, the part of the string captured by each group is available in the BASH_REMATCH
array. The zeroth/first entry in this array corresponds to &
in the replacement pattern of sed
's substitution command (or $&
in Perl), which is the bit of the string that matches the pattern, while the entries at index 1 and onwards corresponds to \1
, \2
, etc. in a sed
replacement pattern (or $1
, $2
etc. in Perl), i.e. the bits matched by each parenthesis.
Example:
string=$( date +%T )
if [[ "$string" =~ ^([0-9][0-9]):([0-9][0-9]):([0-9][0-9])$ ]]; then
printf 'Got %s, %s and %s\n' \
"${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
fi
This may output
Got 09, 19 and 14
if the current time happens to be 09:19:14.
The REMATCH
bit of the BASH_REMATCH
array name comes from "Regular Expression Match", i.e. "RE-Match".
In non-bash
Bourne-like shells, one may also use expr
for limited regular expression matching (using only basic regular expressions).
A small example:
$ string="hello 123 world"
$ expr "$string" : ".*[^0-9]\([0-9][0-9]*\)"
123
Solution 2
You should read the bash man pages, under the [[ expression ]]
section.
An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).
Long story short, =~
is an operator, just like ==
and !=
. It has nothing to do with the actual regex in the string to its right.
Related videos on Youtube
jasonwryan
Updated on September 18, 2022Comments
-
jasonwryan almost 2 years
#!/bin/bash INT=-5 if [[ "$INT" =~ ^-?[0-9]+$ ]]; then echo "INT is an integer." else echo "INT is not an integer." >&2 exit 1 fi
What does the leading
~
do in the starting regular expression? -
George Vasiliou over 7 yearsCan you figure out some examples demonstrating the use of
=~
in real life...? -
Stéphane Chazelas over 7 yearsIt's the same as what
grep -E
understands only on GNU systems and only when using an unquoted variable as the pattern[[ $var = $pattern ]]
(see[[ 'a b' =~ a\sb ]]
vsp='a\sb'; [[ 'a b' =~ $p ]]
). Also beware that shell quoting affects the meaning of RE operators and that some characters need to be quoted for the shell tokenising that may affect the RE processing.[[ '\' =~ [\/] ]]
returns false.ksh93
has even worse issues. Seezsh
(or bash 3.1) for a saner approach where shell and RE quoting are clearly separate. The[
builtin ofzsh
andyash
also have a=~
operator. -
done over 7 years@StéphaneChazelas How is it "saner" that both of this match in zsh?:
[[ "This is a fine mess." =~ T.........fin*es* ]]; [[ "This is a fine mess." =~ T.........fin\*es\* ]]
. Or that a quoted*
also match?[[ "This is a fine mess." =~ "T.........fin*es*" ]]
. -
Stéphane Chazelas over 7 yearsIt's saner (IMO) in that it's much simpler rules. Shell quoting and RE escaping are clearly separate. In
[[ a =~ .* ]]
or[[ a =~ '.*' ]]
or[[ a =~ \.\* ]]
, the same.*
RE is passed to the=~
operator. OTH, inbash
,[[ '\' =~ [)] ]]
returns an error, would you know without trying it whether[[ '\' =~ [\)] ]]
matches? How about[[ '\' =~ [\/] ]]
(it does in ksh93). How aboutc='a-z'; [[ a =~ ["$c"] ]]
(compare with the=
operator)? See also:[[ '\' =~ [^]"."] ]]
which returns false... Note that you can doshopt -s compat31
inbash
to get thezsh
behaviour. -
Stéphane Chazelas over 7 years
zsh
/bash -o compat31
's behaviour for[[ a =~ '.*' ]]
is also consistent with[ a '=~' '.*' ]
(for[
implementations that support=~
) orexpr a : '.*'
. OTOH, it's not consistent with[[ a = '*' ]]
vs[[ a = * ]]
(but then, globs are part of the shell language, while REs are not). -
Richard Fortune almost 7 yearsTo deal with characters in the pattern that might be interpreted by the shell, it's often recommended to do something like this:
pat="..."; if [[ "$string" =~ $pat ]]; then ...
. (@StéphaneChazelas's topmost comment suggested it, I'm just emphasizing it.) -
WabuMike over 5 years@GeorgeVasiliou I use it fairly often in scripts that put the output from a command into a variable. Then the variable is checked to see if it matches some string pattern. This is useful for example if you want to take some action based on some error output from that command.
-
Alex Quinn almost 5 years@Sokel For some, “RTFM” is easier said than done. ⋯
man [[ expresssion ]]
andman [[
return nothing.help [[
returns useful information—since[[
an internal bash command—but does not say whether=~
uses basic or extended regex syntax. ⋯ The text you quoted is from the bash man page. I realize you said “read the bash man pages” but at first, I thought you meant read the man pages within bash. At any rate,man bash
returns a huge file, which is 4139 lines (72 pages) long. It can be searched by pressing/▒▒▒
, which takes a regex, the flavor of which—like=~
—is not specified.