Confusion about `(?=)` regexp in Dart? I know it's a lookahead. More detail in Body
regex101.com has a regex debugger which you can use to see exactly how the regex engine behaves.
A good point to note here is that the matches from your regex are always going to be 0-length, because (?=)
don't match anything. They only look ahead to check for a pattern.
As you may know, the regex engine will move from the start of the string to the end of the string as it matches the characters.
Why does 0a match?
Initially, the engine is at the start of the string. It matches the "start of string" anchor ^
. And then it checks to see if it can see a pattern described in the lookahead (?=.*[0-9])
. Can it? Yes. .*
can match nothing, and [0-9]
can match the 0
. Then it checks the second lookahead. Note that the engine is still at its starting position. It checks (?=.*[a-z])
. .*
matches the 0
and [a-z]
matches the a
. Both lookaheads match so the ^
remains matched.
Why does a0 match?
This is pretty much same as before. The first lookahead: .*
matches a
and [0-9]
matches the 0. The second lookahead: .*
matches nothing and [a-z]
matches a
.
Why does
^(?=.*[0-9])(?=.*[a-z])$
behave differently?
That regex can never match, in fact. Without the lookaheads, the regex becomes ^$
. Only an empty string matches ^$
. And empty strings can't have letters and digits, so the lookaheads will always fail.
sgon00
Updated on December 07, 2022Comments
-
sgon00 over 1 year
First, I know
x(?=y)
Matches'x'
only if'x'
is followed by'y'
.But, when I try
r'^(?=.*[0-9])(?=.*[a-z])'
,- why both
0a
anda0
match? - Why the order is not important at all?
- For
0a
, what it matches?- If it matches the empty string before
0
, it should fail the second condition(?=.*[a-z])
because the empty string before0
followed by0
, but nota-z
. - If it matches
0
because it followed bya
, it should fail the first condition, because0
not followed by[0-9]
. - I don't know what's wrong with the way I think. And I am not sure if I express myself clear so that you can understand what I mean..
- If it matches the empty string before
- why both
and for
r'^(?=.*[0-9])(?=.*[a-z])$'
, if the above situation without$
works, why not this one? I fail to figure out what this matches. It seems it does not match anything.
Thanks a lot for your help.
-
The fourth bird over 5 yearsYour regex consists of assertions only. You test if the string contains a digit. Then you test if the string contains a lowercase character.
-
Pushpesh Kumar Rajwanshi over 5 yearsThere is no ordering in look ahead groups. The regex engine does all efforts to try and match the input and it does not work the other way round where it tries to mismatch the input. So the two lookahead groups you have specified both works as match group 1 and match group 2 and if both match then only it makes a successful match.
-
Pushpesh Kumar Rajwanshi over 5 yearsFor your last point, look aheads only try and match the pattern and they actually don't consume any input, so after lookaheads if there is nothing to consume, the regex will fail to match
-
sgon00 over 5 years@PushpeshKumarRajwanshi I think there is ordering in look ahead groups. By reading the accepted answer, I actually understood. Both
0a
anda0
actually matches the beginning empty string ``. You can find out this in regex101.com. The magic part happened because of both groups have.*
. For example, If the rule changes to'^(?=.*[0-9])(?=[a-z])'
, it won't match"0a"
because it can not match the empty string when there is no.*
before[a-z]
. -
Pushpesh Kumar Rajwanshi over 5 years@sgon00: Ordering if of course there because regex engine will evaluate something first and only then next but I wrote that more in the context that your order of look around will not impact the overall success or failure of the match. Depending upon the input text, one order may be more favorable (performance wise) than the other, but since the input text can be any random string, hence the order does not matter.
-
sgon00 over 5 years@PushpeshKumarRajwanshi sorry, I think you are right. The ordering does not matter. Even without
.*
,'^(?=.*[0-9])(?=[a-z])'
and'^(?=[a-z])(?=.*[0-9])'
are the same. Thanks a lot for this clarification. -
Pushpesh Kumar Rajwanshi over 5 years@sgon00: Yes the ordering doesn't change the regex overall :) Glad to give my little help :)
-
sgon00 over 5 yearsThank you very much for the detail reply and explanation. Thank you very much for introducing the website, very useful. I finally understood both cases match the beginning empty string and the magic part is the
.*
. The.*
makes the ordering unnecessary. Thanks.