regex for n characters or at least m characters

58,421

Solution 1

Optimize the beginning, and anchor it.

^[A-Za-z0-9_]{2}(?:|[A-Za-z0-9_]{2,})$

(Also, you did say to ignore the regex itself, but I guessed you probably wanted 0-9, not 0_9)

EDIT Hm, I was sure I read that you want to match lines. Remove the anchors (^$) if you want to match inside the line as well. If you do match full lines only, anchors will speed you up (well, the front anchor ^ will, at least).

Solution 2

Your solution looks pretty good. As an alternative you can try smth like that:

[A-Za-z0-9_]{2}(?:[A-Za-z0-9_]{2,})?

Btw, I think you want hyphen instead of underscore between 0 and 9, don't you?

Solution 3

The solution you present is correct.

If you're trying to optimize the routine, and the number of matches strings matching 2 or more characters is much smaller than those that do not, consider accepting all strings of length 2 or greater, then tossing those if they're of length 3. This may boost performance by only checking the regex once, and the second call need not even be a regular expression; checking a string length is usually an extremely fast operation.

As always, you really need to run tests on real-world data to verify if this would give you a speed increase.

Share:
58,421
dnclem
Author by

dnclem

Updated on February 25, 2020

Comments

  • dnclem
    dnclem about 4 years

    This should be a pretty simple regex question but I couldn't find any answers anywhere. How would one make a regex, which matches on either ONLY 2 characters, or at least 4 characters. Here is my current method of doing it (ignore the regex itself, that's besides the point):

    [A-Za-z0_9_]{2}|[A-Za-z0_9_]{4,}
    

    However, this method takes twice the time (and is approximately 0.3s slower for me on a 400 line file), so I was wondering if there was a better way to do it?