Confusion about `(?=)` regexp in Dart? I know it's a lookahead. More detail in Body

436

regex101.com has a regex debugger which you can use to see exactly how the regex engine behaves.

A good point to note here is that the matches from your regex are always going to be 0-length, because (?=) don't match anything. They only look ahead to check for a pattern.

As you may know, the regex engine will move from the start of the string to the end of the string as it matches the characters.

Why does 0a match?

Initially, the engine is at the start of the string. It matches the "start of string" anchor ^. And then it checks to see if it can see a pattern described in the lookahead (?=.*[0-9]). Can it? Yes. .* can match nothing, and [0-9] can match the 0. Then it checks the second lookahead. Note that the engine is still at its starting position. It checks (?=.*[a-z]). .* matches the 0 and [a-z] matches the a. Both lookaheads match so the ^ remains matched.

Why does a0 match?

This is pretty much same as before. The first lookahead: .* matches a and [0-9] matches the 0. The second lookahead: .* matches nothing and [a-z] matches a.

Why does ^(?=.*[0-9])(?=.*[a-z])$ behave differently?

That regex can never match, in fact. Without the lookaheads, the regex becomes ^$. Only an empty string matches ^$. And empty strings can't have letters and digits, so the lookaheads will always fail.

Share:
436
sgon00
Author by

sgon00

Updated on December 07, 2022

Comments

  • sgon00
    sgon00 over 1 year

    First, I know x(?=y) Matches 'x' only if 'x' is followed by 'y'.

    • But, when I try r'^(?=.*[0-9])(?=.*[a-z])',

      • why both 0a and a0 match?
      • Why the order is not important at all?
      • For 0a, what it matches?
        • If it matches the empty string before 0, it should fail the second condition (?=.*[a-z]) because the empty string before 0 followed by 0, but not a-z.
        • If it matches 0 because it followed by a, it should fail the first condition, because 0 not followed by [0-9].
        • I don't know what's wrong with the way I think. And I am not sure if I express myself clear so that you can understand what I mean..
    • and for r'^(?=.*[0-9])(?=.*[a-z])$', if the above situation without $ works, why not this one? I fail to figure out what this matches. It seems it does not match anything.

    Thanks a lot for your help.

    • The fourth bird
      The fourth bird over 5 years
      Your regex consists of assertions only. You test if the string contains a digit. Then you test if the string contains a lowercase character.
    • Pushpesh Kumar Rajwanshi
      Pushpesh Kumar Rajwanshi over 5 years
      There is no ordering in look ahead groups. The regex engine does all efforts to try and match the input and it does not work the other way round where it tries to mismatch the input. So the two lookahead groups you have specified both works as match group 1 and match group 2 and if both match then only it makes a successful match.
    • Pushpesh Kumar Rajwanshi
      Pushpesh Kumar Rajwanshi over 5 years
      For your last point, look aheads only try and match the pattern and they actually don't consume any input, so after lookaheads if there is nothing to consume, the regex will fail to match
    • sgon00
      sgon00 over 5 years
      @PushpeshKumarRajwanshi I think there is ordering in look ahead groups. By reading the accepted answer, I actually understood. Both 0a and a0 actually matches the beginning empty string ``. You can find out this in regex101.com. The magic part happened because of both groups have .*. For example, If the rule changes to '^(?=.*[0-9])(?=[a-z])', it won't match "0a" because it can not match the empty string when there is no .* before [a-z].
    • Pushpesh Kumar Rajwanshi
      Pushpesh Kumar Rajwanshi over 5 years
      @sgon00: Ordering if of course there because regex engine will evaluate something first and only then next but I wrote that more in the context that your order of look around will not impact the overall success or failure of the match. Depending upon the input text, one order may be more favorable (performance wise) than the other, but since the input text can be any random string, hence the order does not matter.
    • sgon00
      sgon00 over 5 years
      @PushpeshKumarRajwanshi sorry, I think you are right. The ordering does not matter. Even without .*, '^(?=.*[0-9])(?=[a-z])' and '^(?=[a-z])(?=.*[0-9])' are the same. Thanks a lot for this clarification.
    • Pushpesh Kumar Rajwanshi
      Pushpesh Kumar Rajwanshi over 5 years
      @sgon00: Yes the ordering doesn't change the regex overall :) Glad to give my little help :)
  • sgon00
    sgon00 over 5 years
    Thank you very much for the detail reply and explanation. Thank you very much for introducing the website, very useful. I finally understood both cases match the beginning empty string and the magic part is the .*. The .* makes the ordering unnecessary. Thanks.