Bash test: what does "=~" do?

189,013

Solution 1

The ~ is actually part of the operator =~ which performs a regular expression match of the string to its left to the extended regular expression on its right.

[[ "string" =~ pattern ]]

Note that the string should be quoted, and that the regular expression shouldn't be quoted.

A similar operator is used in the Perl programming language.

The regular expressions understood by bash are the same as those that GNU grep understands with the -E flag, i.e. the extended set of regular expressions.


Somewhat off-topic, but good to know:

When matching against a regular expression containing capturing groups, the part of the string captured by each group is available in the BASH_REMATCH array. The zeroth/first entry in this array corresponds to & in the replacement pattern of sed's substitution command (or $& in Perl), which is the bit of the string that matches the pattern, while the entries at index 1 and onwards corresponds to \1, \2, etc. in a sed replacement pattern (or $1, $2 etc. in Perl), i.e. the bits matched by each parenthesis.

Example:

string=$( date +%T )

if [[ "$string" =~ ^([0-9][0-9]):([0-9][0-9]):([0-9][0-9])$ ]]; then
  printf 'Got %s, %s and %s\n' \
    "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
fi

This may output

Got 09, 19 and 14

if the current time happens to be 09:19:14.

The REMATCH bit of the BASH_REMATCH array name comes from "Regular Expression Match", i.e. "RE-Match".


In non-bash Bourne-like shells, one may also use expr for limited regular expression matching (using only basic regular expressions).

A small example:

$ string="hello 123 world"
$ expr "$string" : ".*[^0-9]\([0-9][0-9]*\)"
123

Solution 2

You should read the bash man pages, under the [[ expression ]] section.

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).

Long story short, =~ is an operator, just like == and !=. It has nothing to do with the actual regex in the string to its right.

Share:
189,013

Related videos on Youtube

jasonwryan
Author by

jasonwryan

Updated on September 18, 2022

Comments

  • jasonwryan
    jasonwryan almost 2 years
    #!/bin/bash
    INT=-5
    
    if [[ "$INT" =~ ^-?[0-9]+$ ]]; then
    
    echo "INT is an integer."
    
    else
    
    echo "INT is not an integer." >&2
    
    exit 1
    
    fi
    

    What does the leading ~ do in the starting regular expression?

  • George Vasiliou
    George Vasiliou over 7 years
    Can you figure out some examples demonstrating the use of =~ in real life...?
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    It's the same as what grep -E understands only on GNU systems and only when using an unquoted variable as the pattern [[ $var = $pattern ]] (see [[ 'a b' =~ a\sb ]] vs p='a\sb'; [[ 'a b' =~ $p ]]). Also beware that shell quoting affects the meaning of RE operators and that some characters need to be quoted for the shell tokenising that may affect the RE processing. [[ '\' =~ [\/] ]] returns false. ksh93 has even worse issues. See zsh (or bash 3.1) for a saner approach where shell and RE quoting are clearly separate. The [ builtin of zsh and yash also have a =~ operator.
  • done
    done over 7 years
    @StéphaneChazelas How is it "saner" that both of this match in zsh?: [[ "This is a fine mess." =~ T.........fin*es* ]]; [[ "This is a fine mess." =~ T.........fin\*es\* ]]. Or that a quoted * also match? [[ "This is a fine mess." =~ "T.........fin*es*" ]].
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    It's saner (IMO) in that it's much simpler rules. Shell quoting and RE escaping are clearly separate. In [[ a =~ .* ]] or [[ a =~ '.*' ]] or [[ a =~ \.\* ]], the same .* RE is passed to the =~ operator. OTH, in bash, [[ '\' =~ [)] ]] returns an error, would you know without trying it whether [[ '\' =~ [\)] ]] matches? How about [[ '\' =~ [\/] ]] (it does in ksh93). How about c='a-z'; [[ a =~ ["$c"] ]] (compare with the = operator)? See also: [[ '\' =~ [^]"."] ]] which returns false... Note that you can do shopt -s compat31 in bash to get the zsh behaviour.
  • Stéphane Chazelas
    Stéphane Chazelas over 7 years
    zsh/bash -o compat31's behaviour for [[ a =~ '.*' ]] is also consistent with [ a '=~' '.*' ] (for [ implementations that support =~) or expr a : '.*'. OTOH, it's not consistent with [[ a = '*' ]] vs [[ a = * ]] (but then, globs are part of the shell language, while REs are not).
  • Richard Fortune
    Richard Fortune almost 7 years
    To deal with characters in the pattern that might be interpreted by the shell, it's often recommended to do something like this: pat="..."; if [[ "$string" =~ $pat ]]; then .... (@StéphaneChazelas's topmost comment suggested it, I'm just emphasizing it.)
  • WabuMike
    WabuMike over 5 years
    @GeorgeVasiliou I use it fairly often in scripts that put the output from a command into a variable. Then the variable is checked to see if it matches some string pattern. This is useful for example if you want to take some action based on some error output from that command.
  • Alex Quinn
    Alex Quinn almost 5 years
    @Sokel For some, “RTFM” is easier said than done. ⋯ man [[ expresssion ]] and man [[ return nothing. help [[ returns useful information—since [[ an internal bash command—but does not say whether =~ uses basic or extended regex syntax. ⋯ The text you quoted is from the bash man page. I realize you said “read the bash man pages” but at first, I thought you meant read the man pages within bash. At any rate, man bash returns a huge file, which is 4139 lines (72 pages) long. It can be searched by pressing /▒▒▒, which takes a regex, the flavor of which—like =~—is not specified.