RegExp - Optional Capture group in Bash?

5,578

Solution 1

bash understands standard extended regular expressions ("ERE"), not PCRE ("Perl-compatible regular expressions").

Your PCRE:

cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(?:-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))-[a-z]

The (?:...) in a PCRE is a non-capturing group (not an optional group). There is no equivalent in an ERE and all groups are capturing.

To make an expression optional, you may qualify it with ?, as I have done below. The ? means that the previous expression should match one or zero times.

As an ERE:

cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)?-[a-z]

or, contracting (SIT[a-z]|SIT[1-9]) into SIT[a-z1-9],

cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]

You may also want to add anchoring to this:

^cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]$

... otherwise it would match somethingcell-...-ablahblah

Solution 2

(?:...) is not an optional capture group, but a non-capturing group, which - as far as I know - is not even supported by bash. This should work:

cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))?-[a-z]

Share:
5,578

Related videos on Youtube

64Hz
Author by

64Hz

Updated on September 18, 2022

Comments

  • 64Hz
    64Hz almost 2 years

    Currently working on some RegExp to parse an input file for correct content. I'm using the below RegExp to parse some input:

    cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(?:-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))-[a-z]
    

    Input it should match:

    cell-80-sandp-sit-a
    

    Or match this:

    cell-80-sandp-a
    

    The -sit part of the input should be an optional capture group, which to my understanding means the RegExp will continue successfully if it does not find this capture group, or also finish successfully if it does find it.

    For this instance, I would be using it in an if statement:

    if [[ "$Input" =~ $RegExp ]];
        then
            #stuff
    fi
    

    Can anyone point out what is wrong with the above? I have been using regex101.com to test it.

  • 64Hz
    64Hz about 6 years
    Hi there, thank you for your very in-depth explanation for me. I tried your simplified one and got 0 matches on both input strings, I noticed there was a missing capture group (that you may have gotten before my edit). To make it work I edited it to this: cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z1-9]|SIT)|TA‌​T|PROD)(-(DEV|DEVL|S‌​ANDP|CAT|(SIT[a-z1-9‌​]|SIT)|TAT|PROD))?-[‌​a-z]
  • Kusalananda
    Kusalananda about 6 years
    @64Hz You may use SIT[a-z1-9]? to match all of SIT and SITa and SIT9.
  • 64Hz
    64Hz about 6 years
    I must have did something wrong the first time, you are 100% correct and it is working. Thank you very much. In regards to case, i will append a shopt -s nocasematch before any case sensitive steps.
  • griffin_cosgrove
    griffin_cosgrove over 3 years
    thanks for this answer, did not know the difference between PCRE and ERE