Regular Expression Lookbehind doesn't work with quantifiers ('+' or '*')
Solution 1
Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:
- only match strings of the same fixed length:
(?<=foo|bar|\s,\s)
(three characters each) - only match strings of fixed lengths:
(?<=foobar|\r\n)
(each branch with fixed length) - only match strings with a upper bound length:
(?<=\s{,4})
(up to four repetitions)
The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.
Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).
See also section about limitations of look-behind assertions on Regular-Expressions.info.
Solution 2
Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K
.
This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..
But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...
Example:
string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'
matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/
will cause the regex to restart after you match the ending div
tag so the regex won't include that in the result. The (?=\div)
will make the engine get everything in front of ending div tag
Solution 3
What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group
(?<=this\sis\san)(?:\s*)example
That make it a fixed length look behind, so it should work.
Related videos on Youtube
Comments
-
Noel De Martin over 3 years
I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:
(?<=this\sis\san\s*?)example
What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?
I also tried those two and they work correctly, but don't fulfill my needs:
(?<=this\sis\san\s)example this\sis\san\s*?example
I am using this site to test my regular expressions: http://gskinner.com/RegExr/
-
Rich about 12 yearsThis needs a tag that identifies the language or environment where you use them. .NET's regular expressions handle this without a problem.
-
noob about 12 yearsNotice! If your regex would work like you want it will also match
example
from this:this is anexample
. So if you don't want that you should remove the?
-
Rich about 12 yearsmicha: They should probably just change the * to a
+
. Removing the?
has no effect in that regard. But indeed,*?
as a quantifier is useless and unnecessary in this case as there isn't any more whitespace to match after that, so\s*?
is equivalent to\s*
.
-
-
Rich about 12 yearsIt's only the lookbehind that's problematic. Lookahead can be anything in all regex engines that support it.
-
noob about 12 yearsIt's the same like
(?<=this\sis\san)\s*?example
that means that it also match the spaces and for your information(?:
)
makes the process slower. -
Rich about 12 yearsmicha, I'd worry more about the matching part in that case than about performance. I get on average 0.02451781 ms with the non-capuring group and 0.02370844 ms without it. I don't think that's a significant difference.
-
Bohemian about 12 years@micha No. It is not the same. It's a non-capturing group. My regex only matches
example
(without the leading spaces), but your example includes leading spaces -
akostadinov over 9 yearsthis works with ruby 2.x but fails with 1.9 and jruby 1.7.x; original comment: good one, I'm surprised I never knew this feature. Learn to format code in the editor and you'll be priceless
-
Abraham Murciano Benzadon almost 7 yearsThis regex will match any preceding spaces. eg
this is an[ example]
. (square brackets represent a match). Just because it is in a non-capturing group, doesn't mean it isn't matched. It just means it isn't captured in a group which would normally be captured in normal brackets. The right way to do this would be using\K
like @Leon said -
Josh Withee about 6 yearsIn my answer to this question, I have listed some strategies/workarounds after I ran into this limitation on negative lookbehinds. Hope it can help some others too!
-
alstr almost 4 yearsThis doesn't work. Leading spaces are included in the match. Just copy and paste it in regex101.com.