Regex to match two or more consecutive characters

12,673

You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:

^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$

How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with \1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.

Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:

^[a-zA-Z][a-zA-Z\d._-]*$

And one for the consecutive characters (where you can invert the match result):

(.)\1

This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.

Share:
12,673
Anvesh Raavi
Author by

Anvesh Raavi

Updated on June 13, 2022

Comments

  • Anvesh Raavi
    Anvesh Raavi almost 2 years

    Using regular expressions I want to match a word which

    • starts with a letter
    • has english alpahbets
    • numbers, period(.), hyphen(-), underscore(_)
    • should not have two or more consecutive periods or hyphens or underscores
    • can have multiple periods or hyphens or underscore

    For example,

    flin..stones or flin__stones or flin--stones

    are not allowed.

    fl_i_stones or fli_st.ones or flin.stones or flinstones

    is allowed .

    So far My regular expression is ^[a-zA-Z][a-zA-Z\d._-]+$

    So My question is how to do it using regular expression