Regex exactly n OR m times

java php regex

116,667

Solution 1

There is no single quantifier that means "exactly m or n times". The way you are doing it is fine.

An alternative is:

X{m}(X{k})?

where m < n and k is the value of n-m.

Solution 2

Here is the complete list of quantifiers (ref. http://www.regular-expressions.info/reference.html):

?, ?? - 0 or 1 occurences (?? is lazy, ? is greedy)
*, *? - any number of occurences
+, +? - at least one occurence
{n} - exactly n occurences
{n,m} - n to m occurences, inclusive
{n,m}? - n to m occurences, lazy
{n,}, {n,}? - at least n occurence

To get "exactly N or M", you need to write the quantified regex twice, unless m,n are special:

X{n,m} if m = n+1
(?:X{n}){1,2} if m = 2n
...

Solution 3

No, there is no such quantifier. But I'd restructure it to /X{m}(X{m-n})?/ to prevent problems in backtracking.

Solution 4

TLDR; (?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

Looks like you want "x n times" or "x m times", I think a literal translation to regex would be (x{n}|x{m}). Like this https://regex101.com/r/vH7yL5/1

or, in a case where you can have a sequence of more than m "x"s (assuming m > n), you can add 'following no "x"' and 'followed by no "x", translating to [^x](x{n}|x{m})[^x] but that would assume that there are always a character behind and after you "x"s. As you can see here: https://regex101.com/r/bB2vH2/1

you can change it to (?:[^x]|^)(x{n}|x{m})(?:[^x]|$), translating to "following no 'x' or following line start" and "followed by no 'x' or followed by line end". But still, it won't match two sequences with only one character between them (because the first match would require a character after, and the second a character before) as you can see here: https://regex101.com/r/oC5oJ4/1

Finally, to match the one character distant match, you can add a positive look ahead (?=) on the "no 'x' after" or a positive look behind (?<=) on the "no 'x' before", like this: https://regex101.com/r/mC4uX3/1

(?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

This way you will match only the exact number of 'x's you want.

Solution 5

Very old post, but I'd like to contribute sth that might be of help. I've tried it exactly the way stated in the question and it does work but there's a catch: The order of the quantities matters. Consider this:

#[a-f0-9]{6}|#[a-f0-9]{3}

This will find all occurences of hex colour codes (they're either 3 or 6 digits long). But when I flip it around like this

#[a-f0-9]{3}|#[a-f0-9]{6}

it will only find the 3 digit ones or the first 3 digits of the 6 digit ones. This does make sense and a Regex pro might spot this right away, but for many this might be a peculiar behaviour. There are some advanced Regex features that might avoid this trap regardless of the order, but not everyone is knee-deep into Regex patterns.

View more solutions

116,667

Author by

FThompson

I am a technical writer with a passion for taking a holistic approach to software design and documentation in order to create effective, usable applications and libraries designed with all stakeholders' requirements in mind. I sometimes go by Vulcan online.

Updated on July 08, 2022

Comments

FThompson almost 2 years
Consider the following regular expression, where X is any regex.
```
X{n}|X{m}
```
This regex would test for X occurring exactly n or m times.

Is there a regex quantifier that can test for an occurrence X exactly n or m times?
- John Dvorak over 11 years
  
  No. Two occurences of X is the best you can get for general m, n.
- nalply almost 4 years
  
  If this were my problem I would try out regex backreferences and would start with (X)\1{n-1}(?:\1{m-n-1}). I know this matches X at least once but just to get started try this simple thing then refine by using lookaheads or lookbehinds instead of (X).
erb about 9 years

Why is the ?: needed in the if m = 2n example? Seems to work fine without it for me.
John Dvorak about 9 years

@erb if you leave out ?:, the group becomes a capturing group. Aside from the regex engine remembering stuff it doesn't have to, if you have capturing groups after this one, their IDs will change. If you use your regex for substitution, you will have to adjust the replacement.
Enhardened about 5 years

Cool, I was not familiar with how regex handled boundaries. The only issue with this method is when you are using a non-standard boundary. Tale a look: regex101.com/r/j0nkeo/1 and regex101.com/r/4Ix7Dr/1
rozza2058 about 5 years

@Enhardened - that's a good point, seems to be an issue with multiple matching groups which overlap. That is a situation where you'd need to use look behind.