RegExp - Optional Capture group in Bash?
Solution 1
bash
understands standard extended regular expressions ("ERE"), not PCRE ("Perl-compatible regular expressions").
Your PCRE:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(?:-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))-[a-z]
The (?:...)
in a PCRE is a non-capturing group (not an optional group). There is no equivalent in an ERE and all groups are capturing.
To make an expression optional, you may qualify it with ?
, as I have done below. The ?
means that the previous expression should match one or zero times.
As an ERE:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)?-[a-z]
or, contracting (SIT[a-z]|SIT[1-9])
into SIT[a-z1-9]
,
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]
You may also want to add anchoring to this:
^cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|SIT[a-z1-9]|TAT|PROD))?-[a-z]$
... otherwise it would match somethingcell-...-ablahblah
Solution 2
(?:...)
is not an optional capture group, but a non-capturing group, which - as far as I know - is not even supported by bash. This should work:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))?-[a-z]
Related videos on Youtube
64Hz
Updated on September 18, 2022Comments
-
64Hz almost 2 years
Currently working on some RegExp to parse an input file for correct content. I'm using the below RegExp to parse some input:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD)(?:-(DEV|DEVL|SANDP|CAT|(SIT[a-z]|SIT[1-9])|TAT|PROD))-[a-z]
Input it should match:
cell-80-sandp-sit-a
Or match this:
cell-80-sandp-a
The
-sit
part of the input should be an optional capture group, which to my understanding means the RegExp will continue successfully if it does not find this capture group, or also finish successfully if it does find it.For this instance, I would be using it in an if statement:
if [[ "$Input" =~ $RegExp ]]; then #stuff fi
Can anyone point out what is wrong with the above? I have been using
regex101.com
to test it. -
64Hz about 6 yearsHi there, thank you for your very in-depth explanation for me. I tried your simplified one and got 0 matches on both input strings, I noticed there was a missing capture group (that you may have gotten before my edit). To make it work I edited it to this:
cell-(90|855|80|70)-(DEV|DEVL|SANDP|CAT|(SIT[a-z1-9]|SIT)|TAT|PROD)(-(DEV|DEVL|SANDP|CAT|(SIT[a-z1-9]|SIT)|TAT|PROD))?-[a-z]
-
Kusalananda about 6 years@64Hz You may use
SIT[a-z1-9]?
to match all ofSIT
andSITa
andSIT9
. -
64Hz about 6 yearsI must have did something wrong the first time, you are 100% correct and it is working. Thank you very much. In regards to case, i will append a
shopt -s nocasematch
before any case sensitive steps. -
griffin_cosgrove over 3 yearsthanks for this answer, did not know the difference between PCRE and ERE