How can I match a string with a regex in Bash?

375,188

Solution 1

To match regexes you need to use the =~ operator.

Try this:

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

Alternatively, you can use wildcards (instead of regexes) with the == operator:

[[ sed-4.2.2.tar.bz2 == *tar.bz2 ]] && echo matched

If portability is not a concern, I recommend using [[ instead of [ or test as it is safer and more powerful. See What is the difference between test, [ and [[ ? for details.

Solution 2

A Function To Do This

extract () {
  if [ -f $1 ] ; then
      case $1 in
          *.tar.bz2)   tar xvjf $1    ;;
          *.tar.gz)    tar xvzf $1    ;;
          *.bz2)       bunzip2 $1     ;;
          *.rar)       rar x $1       ;;
          *.gz)        gunzip $1      ;;
          *.tar)       tar xvf $1     ;;
          *.tbz2)      tar xvjf $1    ;;
          *.tgz)       tar xvzf $1    ;;
          *.zip)       unzip $1       ;;
          *.Z)         uncompress $1  ;;
          *.7z)        7z x $1        ;;
          *)           echo "don't know '$1'..." ;;
      esac
  else
      echo "'$1' is not a valid file!"
  fi
}

Other Note

In response to Aquarius Power in the comment above, We need to store the regex on a var

The variable BASH_REMATCH is set after you match the expression, and ${BASH_REMATCH[n]} will match the nth group wrapped in parentheses ie in the following ${BASH_REMATCH[1]} = "compressed" and ${BASH_REMATCH[2]} = ".gz"

if [[ "compressed.gz" =~ ^(.*)(\.[a-z]{1,5})$ ]]; 
then 
  echo ${BASH_REMATCH[2]} ; 
else 
  echo "Not proper format"; 
fi

(The regex above isn't meant to be a valid one for file naming and extensions, but it works for the example)

Solution 3

I don't have enough rep to comment here, so I'm submitting a new answer to improve on dogbane's answer. The dot . in the regexp

[[ sed-4.2.2.tar.bz2 =~ tar.bz2$ ]] && echo matched

will actually match any character, not only the literal dot between 'tar.bz2', for example

[[ sed-4.2.2.tar4bz2 =~ tar.bz2$ ]] && echo matched
[[ sed-4.2.2.tar§bz2 =~ tar.bz2$ ]] && echo matched

or anything that doesn't require escaping with '\'. The strict syntax should then be

[[ sed-4.2.2.tar.bz2 =~ tar\.bz2$ ]] && echo matched

or you can go even stricter and also include the previous dot in the regex:

[[ sed-4.2.2.tar.bz2 =~ \.tar\.bz2$ ]] && echo matched

Solution 4

Since you are using bash, you don't need to create a child process for doing this. Here is one solution which performs it entirely within bash:

[[ $TEST =~ ^(.*):\ +(.*)$ ]] && TEST=${BASH_REMATCH[1]}:${BASH_REMATCH[2]}

Explanation: The groups before and after the sequence "colon and one or more spaces" are stored by the pattern match operator in the BASH_REMATCH array.

Solution 5

shopt -s nocasematch

if [[ sed-4.2.2.$LINE =~ (yes|y)$ ]]
 then exit 0 
fi
Share:
375,188

Related videos on Youtube

user1587462
Author by

user1587462

Updated on July 08, 2022

Comments

  • user1587462
    user1587462 almost 2 years

    I am trying to write a bash script that contains a function so when given a .tar, .tar.bz2, .tar.gz etc. file it uses tar with the relevant switches to decompress the file.

    I am using if elif then statements which test the filename to see what it ends with and I cannot get it to match using regex metacharacters.

    To save constantly rewriting the script I am using 'test' at the command line, I thought the statement below should work, I have tried every combination of brackets, quotes and metacharaters possible and still it fails.

    test sed-4.2.2.tar.bz2 = tar\.bz2$; echo $?
    (this returns 1, false)
    

    I'm sure the problem is a simple one and I've looked everywhere, yet I cannot fathom how to do it. Does someone know how I can do this?

  • Alan Porter
    Alan Porter over 10 years
    Be careful with the glob wildcard matching in the second example. Inside [[ ]], the * is not expanded as it usually is, to match filenames in the current directory that match a pattern.Your example works, but it's really easy to over-generalize and mistakenly believe that * means to match anything in any context. It only works like that inside [[ ]]. Otherwise, it expands to the existing filenames.
  • Aquarius Power
    Aquarius Power about 10 years
    I tried to use quotes on the regex and failed; this answer helped on making this work check="^a.*c$";if [[ "abc" =~ $check ]];then echo match;fi we need to store the regex on a var
  • pevik
    pevik over 9 years
    Also to note that regexp (like in perl) must NOT be in parenthesis: [[ sed-4.2.2.tar.bz2 == "*tar.bz2" ]] wouldn't work.
  • Good Person
    Good Person about 8 years
    also note that with BSD tar you can use "tar xf" for all formats and don't need separate commands or this function whatsoever.
  • Skippy le Grand Gourou
    Skippy le Grand Gourou over 7 years
    FWIW, the syntax for negation (i.e. does not match) is [[ ! foo =~ bar ]].
  • Admin
    Admin over 7 years
    dash doesn't support the -n 1 parameter, neither does it put it automatically into a $REPLY variable. Watch Out!
  • Mark K Cowan
    Mark K Cowan about 7 years
    a on GNU tar or p on BSD tar to explicitly tell it to automatically infer compression type from extension. GNU tar will not do it automatically otherwise, and I'm guessing from @GoodPerson 's comment that BSD tar does do it by default.
  • miken32
    miken32 over 6 years
    If portability is a concern, then don't use the =~ operator!
  • James Brown
    James Brown over 6 years
    The page you linked to mentiones RegularExpression matching =~ [is] (not available) [in] old test [ so I guess it's not an option in the instead of part.
  • mosh
    mosh over 6 years
    7z can unpack .. AR, ARJ, CAB, CHM, CPIO, CramFS, DMG, EXT, FAT, GPT, HFS, IHEX, ISO, LZH, LZMA, MBR, MSI, NSIS, NTFS, QCOW2, RAR, RPM, SquashFS, UDF, UEFI, VDI, VHD, VMDK, WIM, XAR and Z. see 7-zip.org
  • void.pointer
    void.pointer almost 6 years
    Why do quotes cause the regex to not match? I thought it was a best practice to quote any variable usage, like "$foo", so [[ "$foo" == "^release/" ]] seems like it should work...
  • i336_
    i336_ almost 6 years
    This is extremely dangerous; it only behaves without undefined behavior for you because you have no files in the current directory named the literal substring "pattern". Go ahead, create some files named like that, and substring expansion will match the files and break everything horribly with multicolored heisenbugs.
  • Rainer Schwarze
    Rainer Schwarze over 5 years
    Note that index 0 contains the full match and index 1 and 2 contain the group matches.
  • Admin
    Admin over 5 years
    But I have done an experiment: with files `1pattern, pattern pattern2 and pattern in the current directory. This script works as expected. Could you please provide me with your test result? @i336_
  • user1934428
    user1934428 over 5 years
    @i336: I don't think so. Within [[ ... ]], the rhs glob pattern does not expand according tho the current directory, as it would usually do.
  • rosshjb
    rosshjb almost 4 years
    @i336_ No. Within [[...]], Bash doesn't perform filename expansion. In bash manual, Word splitting and filename expansion are not performed on the words between the [[ and ]];
  • user1934428
    user1934428 about 3 years
    @juancortez : It also does not really fulfil the requirments of the OP, who - for whatever reason - asked for matching a regexp.