Java PatternSyntaxException: Illegal repetition on string substitution?

68,625

Solution 1

In ${fizz}

{ is an indicator to the regex engine that you are about to start a repetition indicator, like {2,4} which means '2 to 4 times of the previous token'. But {f is illegal, because it has to be followed by a number, so it throws an exception.

You need to escape all regex metacharacters (in this case $, { and }) (try using http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String) ) or use a different method that substitutes a string for a string, not a regex for a string.

Solution 2

As pointed out by Patashu, the problem is in replaceFirst(token, replacementValue), that expects a regex in the first argument, not a literal. Change it to replaceFirst(Pattern.quote(token), replacementValue) and you will do alright.

I also changed a bit the first regex, as it goes faster with + instead of * but that's not necessary.

static String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
    String regex = "\\$\\{([^}]+)\\}";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(toInspect);
    String result = toInspect;
    while(matcher.find()) {
        String token = matcher.group();     // Ex: ${fizz}
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacementValue = null;

        if(tokensMap.containsKey(tokenKey))
            replacementValue = tokensMap.get(tokenKey);
        else
            throw new RuntimeException("String contained an unsupported token.");

        result = result.replaceFirst(Pattern.quote(token), replacementValue);
    }

    return result;
}

Solution 3

Adapted from Matcher.replaceAll

boolean result = matcher.find();
if (result) {
    StringBuffer sb = new StringBuffer();
    do {
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacement = Matcher.quoteReplacement(tokensMap.get(tokenKey));
        matcher.appendReplacement(sb, replacement);
        result = matcher.find();
    } while (result);
    matcher.appendTail(sb);
    return sb.toString();
}

Solution 4

You can make your RegEx a bit ugly, but this will work

String regex = "\\$[\\{]([^}]*)[\\}]";
Share:
68,625
Admin
Author by

Admin

Updated on May 25, 2020

Comments

  • Admin
    Admin almost 4 years

    I am trying to write a method that will accept a String, inspect it for instances of certain tokens (e.g. ${fizz}, ${buzz}, ${foo}, etc.) and replace each token with a new string that is fetched from a Map<String,String>.

    For example, if I pass this method the following string:

    "How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}."

    And if the method consulted the following Map<String,String>:

    Key             Value
    ==========================
    "fizz"          "brown"
    "buzz"          "arsonist"
    "foo"           "feet"
    

    Then the resultant string would be:

    "How now brown cow. The arsonist had oddly-shaped feet."

    Here is my method:

    String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
        String regex = "\\$\\{([^}]*)\\}";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(toInspect);
        while(matcher.find()) {
            String token = matcher.group();     // Ex: ${fizz}
            String tokenKey = matcher.group(1); // Ex: fizz
            String replacementValue = null;
    
            if(tokensMap.containsKey(tokenKey))
                replacementValue = tokensMap.get(tokenKey);
            else
                throw new RuntimeException("String contained an unsupported token.");
    
            toInspect = toInspect.replaceFirst(token, replacementValue);
        }
    
        return toInspect;
    }
    

    When I run this, I get the following exception:

    Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0
    ${fizz}
    ^
        at java.util.regex.Pattern.error(Pattern.java:1730)
        at java.util.regex.Pattern.closure(Pattern.java:2792)
        at java.util.regex.Pattern.sequence(Pattern.java:1906)
        at java.util.regex.Pattern.expr(Pattern.java:1769)
        at java.util.regex.Pattern.compile(Pattern.java:1477)
        at java.util.regex.Pattern.<init>(Pattern.java:1150)
        at java.util.regex.Pattern.compile(Pattern.java:840)
        at java.lang.String.replaceFirst(String.java:2158)
        ...rest of stack trace omitted for brevity (but available upon request!)
    

    Why am I getting this? And what is the correct fix? Thanks in advance!

  • Brian
    Brian almost 11 years
    The { is escaped though: "\\$\\{([^}]*)\\}"
  • Patashu
    Patashu almost 11 years
    @Brian No it's not. Stack trace: at java.lang.String.replaceFirst(String.java:2158) refers to this line: toInspect = toInspect.replaceFirst(token, replacementValue); And token's value is ${fizz}, with no escaping.
  • Brian
    Brian almost 11 years
    Ah, yes! Of course. Read too fast, I guess. +1 Maybe you could elaborate more on what exactly is being passed in and why it's failing.
  • Admin
    Admin almost 11 years
    Exactly - I'm still struggling to put your answer into motion, but thanks @Patashu and +1!
  • Brian
    Brian almost 11 years
    @TicketMonster replaceFirst expects a regular expression in the first argument, just like the Pattern.compile method. However, you're passing it ${fizz} without any escaping. Use Pattern.quote(token) before passing token in.
  • Anirudha
    Anirudha almost 11 years
    that still doesn't answer the question..after replacing string matcher would point to invalid position!
  • Anirudha
    Anirudha almost 11 years
    @TicketMonster your code would not replace properly even after escaping it..matcher would point to an invalid position..for example try replacing this ${fizz}${buzz} where fizz is mapped to an empty string
  • Brian
    Brian almost 11 years
    @Anirudh Again, strings are immutable. Changing the toInspect reference does nothing to the string that the Matcher is using (the original toInspect reference).
  • Anirudha
    Anirudha almost 11 years
    ahh..ok..in this case it would work since op is performing linear search for ${} but still..changing string to which matcher refers to is dangerous
  • Brian
    Brian almost 11 years
    @Anirudh Looks like this solution works to me: ideone.com/AocIh6 Your solution enters an infinite loop by the way.
  • Patashu
    Patashu almost 11 years
    @Anirudh You can't change the string to which matcher refers to because strings are immutable.
  • Brian
    Brian almost 11 years
    @Anirudh Read this please.
  • Anirudha
    Anirudha almost 11 years
    @Brian it should be matcher.reset(toInspect); and i never said its mutable i said matcher would point to invalid position.op's code works becuase he's replacing through string not matcher,..(missed it).see this and remove reset line.it won't run
  • Brian
    Brian almost 11 years
    If you want to get really picky about speed, use [^}]++ to make it possessive instead of just greedy so it will never backtrack.
  • Anirudha
    Anirudha almost 11 years
    @Patashu replace it through matcher not toInspect i.e matcher.replaceFirst
  • Patashu
    Patashu almost 11 years
    @Anirudh "Invoking this method changes this matcher's state. If the matcher is to be used in further matching operations then it should first be reset." This rule only applies if you use replaceFirst on the matcher itself. The OP did NOT, so why even bring it up?
  • Anirudha
    Anirudha almost 11 years
    @Patashu i had misinterpreted the question to which i came know minutes ago..:P
  • Miguel
    Miguel almost 11 years
    The speed results are surprising: using my method, the original regex ([^}]*) took 37.04 ms to run 10000 times, my regex ([^}]+) took 37.35 ms and suggested regex ([^}]++) 36.98 ms. Almost no difference.
  • Miguel
    Miguel almost 11 years
    Adding here my test results using this method: the original regex ([^}]*) took 15.14 ms to run 10000 times, my regex ([^}]+) took 14.88 ms and regex suggested by @Brian ([^}]++) 15.89 ms. As I said before this method is a absolute winner in terms of speed and here my regex fits better :)
  • johnchen902
    johnchen902 almost 11 years
    @Miguel Try to use a very long toInspect as input, such as "${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo‌​}...${fizz}${buzz}${‌​foo}${fizz}${buzz}${‌​foo}${fizz}${buzz}${‌​foo}${fizz}${buzz}${‌​foo}" (I bet my StringBuffer will be faster than your String, when the length grow, it won't be only 3x faster.)
  • Miguel
    Miguel almost 11 years
    Using "How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}‌​${buzz}${fizz}${foo}‌​${buzz}${buzz}${foo}‌​${fizz}${foo}${fizz}‌​${buzz}${fizz}${foo}‌​${buzz}${buzz}${foo}‌​${fizz}${foo}${fizz}‌​${buzz}" as the input this method performed even better then mine, it was 5x faster. The results were different for regex. The original regex ([^}]*) took 92.94 ms to run 10000 times, my regex ([^}]+) took 93.6 ms and regex suggested by @Brian ([^}]++) 91.83 ms. Brian's did better. Why is that so?
  • Miguel
    Miguel almost 11 years
    For a longer input "How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}‌​${buzz}${fizz}${foo}‌​${buzz}${buzz}${foo}‌​${fizz}${foo}${fizz}‌​${buzz}${fizz}${foo}‌​${buzz}${buzz}${foo}‌​${fizz}${foo}${fizz}‌​${buzz}" the results were different for regex. The original regex ([^}]*) took 515.47 ms to run 10000 times, my regex ([^}]+) took 514.41 ms and regex suggested by @Brian ([^}]++) 507.41 ms. Still close, but Brian's did better.
  • johnchen902
    johnchen902 almost 11 years
    @Miguel I only know about 3x->5x: 1. replaceFirst use a regex again, which is unnecessary. 2. In a loop, StringBuffer concats String faster than directly in String, since String generate a new String everytime. Which regex to use is not important.
  • Miguel
    Miguel almost 11 years
    Strings are mutable if you allow reflection. algs4.cs.princeton.edu/12oop/MutableString.java.html But this has nothing to do with the original question anyway.
  • Brian
    Brian almost 11 years
    Did you use an actual benchmarking tool to check like Caliper? Also, you should make sure that you test with partially matched strings as well, such as The ${fizz cow with no terminating }. Also, to be fair, I said really picky :3
  • Miguel
    Miguel almost 11 years
    No, I wrote the benchmarking myself. Did a warm-up loop in the beginning to activate hotspot and called System.gc() several times outside measurement. But, yes, I know it is not bulletproof. I didn't know about Caliper, though. Good to know it!
  • Vasu
    Vasu about 3 years
    i know this is very old thread, but now I am facing same issue, can some one please advise me on concreate solution for this, Thank you!