Java PatternSyntaxException: Illegal repetition on string substitution?
Solution 1
In ${fizz}
{
is an indicator to the regex engine that you are about to start a repetition indicator, like {2,4}
which means '2 to 4 times of the previous token'. But {f
is illegal, because it has to be followed by a number, so it throws an exception.
You need to escape all regex metacharacters (in this case $
, {
and }
) (try using http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String) ) or use a different method that substitutes a string for a string, not a regex for a string.
Solution 2
As pointed out by Patashu, the problem is in replaceFirst(token, replacementValue)
, that expects a regex in the first argument, not a literal. Change it to replaceFirst(Pattern.quote(token), replacementValue)
and you will do alright.
I also changed a bit the first regex, as it goes faster with +
instead of *
but that's not necessary.
static String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
String regex = "\\$\\{([^}]+)\\}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(toInspect);
String result = toInspect;
while(matcher.find()) {
String token = matcher.group(); // Ex: ${fizz}
String tokenKey = matcher.group(1); // Ex: fizz
String replacementValue = null;
if(tokensMap.containsKey(tokenKey))
replacementValue = tokensMap.get(tokenKey);
else
throw new RuntimeException("String contained an unsupported token.");
result = result.replaceFirst(Pattern.quote(token), replacementValue);
}
return result;
}
Solution 3
Adapted from Matcher.replaceAll
boolean result = matcher.find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
String tokenKey = matcher.group(1); // Ex: fizz
String replacement = Matcher.quoteReplacement(tokensMap.get(tokenKey));
matcher.appendReplacement(sb, replacement);
result = matcher.find();
} while (result);
matcher.appendTail(sb);
return sb.toString();
}
Solution 4
You can make your RegEx a bit ugly, but this will work
String regex = "\\$[\\{]([^}]*)[\\}]";
Admin
Updated on May 25, 2020Comments
-
Admin almost 4 years
I am trying to write a method that will accept a
String
, inspect it for instances of certain tokens (e.g.${fizz}
,${buzz}
,${foo}
, etc.) and replace each token with a new string that is fetched from aMap<String,String>
.For example, if I pass this method the following string:
"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}."
And if the method consulted the following
Map<String,String>
:Key Value ========================== "fizz" "brown" "buzz" "arsonist" "foo" "feet"
Then the resultant string would be:
"How now brown cow. The arsonist had oddly-shaped feet."
Here is my method:
String substituteAllTokens(Map<String,String> tokensMap, String toInspect) { String regex = "\\$\\{([^}]*)\\}"; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(toInspect); while(matcher.find()) { String token = matcher.group(); // Ex: ${fizz} String tokenKey = matcher.group(1); // Ex: fizz String replacementValue = null; if(tokensMap.containsKey(tokenKey)) replacementValue = tokensMap.get(tokenKey); else throw new RuntimeException("String contained an unsupported token."); toInspect = toInspect.replaceFirst(token, replacementValue); } return toInspect; }
When I run this, I get the following exception:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0 ${fizz} ^ at java.util.regex.Pattern.error(Pattern.java:1730) at java.util.regex.Pattern.closure(Pattern.java:2792) at java.util.regex.Pattern.sequence(Pattern.java:1906) at java.util.regex.Pattern.expr(Pattern.java:1769) at java.util.regex.Pattern.compile(Pattern.java:1477) at java.util.regex.Pattern.<init>(Pattern.java:1150) at java.util.regex.Pattern.compile(Pattern.java:840) at java.lang.String.replaceFirst(String.java:2158) ...rest of stack trace omitted for brevity (but available upon request!)
Why am I getting this? And what is the correct fix? Thanks in advance!
-
Brian almost 11 yearsThe
{
is escaped though:"\\$\\{([^}]*)\\}"
-
Patashu almost 11 years@Brian No it's not. Stack trace:
at java.lang.String.replaceFirst(String.java:2158)
refers to this line:toInspect = toInspect.replaceFirst(token, replacementValue);
Andtoken
's value is${fizz}
, with no escaping. -
Brian almost 11 yearsAh, yes! Of course. Read too fast, I guess. +1 Maybe you could elaborate more on what exactly is being passed in and why it's failing.
-
Admin almost 11 yearsExactly - I'm still struggling to put your answer into motion, but thanks @Patashu and +1!
-
Brian almost 11 years@TicketMonster
replaceFirst
expects a regular expression in the first argument, just like thePattern.compile
method. However, you're passing it${fizz}
without any escaping. UsePattern.quote(token)
before passingtoken
in. -
Anirudha almost 11 yearsthat still doesn't answer the question..after replacing string matcher would point to invalid position!
-
Anirudha almost 11 years@TicketMonster your code would not replace properly even after escaping it..matcher would point to an invalid position..for example try replacing this
${fizz}${buzz}
where fizz is mapped to an empty string -
Brian almost 11 years@Anirudh Again, strings are immutable. Changing the
toInspect
reference does nothing to the string that theMatcher
is using (the originaltoInspect
reference). -
Anirudha almost 11 yearsahh..ok..in this case it would work since op is performing linear search for
${}
but still..changing string to which matcher refers to is dangerous -
Brian almost 11 years@Anirudh Looks like this solution works to me: ideone.com/AocIh6 Your solution enters an infinite loop by the way.
-
Patashu almost 11 years@Anirudh You can't change the string to which matcher refers to because strings are immutable.
-
Brian almost 11 years@Anirudh Read this please.
-
Anirudha almost 11 years@Brian it should be
matcher.reset(toInspect);
and i never said itsmutable
i said matcher would point to invalid position.op's codeworks
becuase he's replacing through string not matcher,..(missed it).see this and remove reset line.it won't run -
Brian almost 11 yearsIf you want to get really picky about speed, use
[^}]++
to make it possessive instead of just greedy so it will never backtrack. -
Anirudha almost 11 years@Patashu replace it through
matcher
nottoInspect
i.ematcher.replaceFirst
-
Patashu almost 11 years@Anirudh "Invoking this method changes this matcher's state. If the matcher is to be used in further matching operations then it should first be reset." This rule only applies if you use replaceFirst on the matcher itself. The OP did NOT, so why even bring it up?
-
Anirudha almost 11 years@Patashu i had misinterpreted the question to which i came know minutes ago..:P
-
Miguel almost 11 yearsThe speed results are surprising: using my method, the original regex (
[^}]*
) took 37.04 ms to run 10000 times, my regex ([^}]+
) took 37.35 ms and suggested regex ([^}]++
) 36.98 ms. Almost no difference. -
Miguel almost 11 yearsAdding here my test results using this method: the original regex (
[^}]*
) took 15.14 ms to run 10000 times, my regex ([^}]+
) took 14.88 ms and regex suggested by @Brian ([^}]++
) 15.89 ms. As I said before this method is a absolute winner in terms of speed and here my regex fits better :) -
johnchen902 almost 11 years@Miguel Try to use a very long
toInspect
as input, such as"${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}...${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}"
(I bet myStringBuffer
will be faster than yourString
, when the length grow, it won't be only 3x faster.) -
Miguel almost 11 yearsUsing
"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}"
as the input this method performed even better then mine, it was 5x faster. The results were different for regex. The original regex ([^}]*
) took 92.94 ms to run 10000 times, my regex ([^}]+
) took 93.6 ms and regex suggested by @Brian ([^}]++
) 91.83 ms. Brian's did better. Why is that so? -
Miguel almost 11 yearsFor a longer input
"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}"
the results were different for regex. The original regex ([^}]*
) took 515.47 ms to run 10000 times, my regex ([^}]+
) took 514.41 ms and regex suggested by @Brian ([^}]++
) 507.41 ms. Still close, but Brian's did better. -
johnchen902 almost 11 years@Miguel I only know about 3x->5x: 1.
replaceFirst
use a regex again, which is unnecessary. 2. In a loop,StringBuffer
concatsString
faster than directly inString
, sinceString
generate a newString
everytime. Which regex to use is not important. -
Miguel almost 11 yearsStrings are mutable if you allow reflection. algs4.cs.princeton.edu/12oop/MutableString.java.html But this has nothing to do with the original question anyway.
-
Brian almost 11 yearsDid you use an actual benchmarking tool to check like Caliper? Also, you should make sure that you test with partially matched strings as well, such as
The ${fizz cow
with no terminating}
. Also, to be fair, I said really picky :3 -
Miguel almost 11 yearsNo, I wrote the benchmarking myself. Did a warm-up loop in the beginning to activate hotspot and called
System.gc()
several times outside measurement. But, yes, I know it is not bulletproof. I didn't know about Caliper, though. Good to know it! -
Vasu about 3 yearsi know this is very old thread, but now I am facing same issue, can some one please advise me on concreate solution for this, Thank you!