List of all special characters that need to be escaped in a regex

375,890

Solution 1

You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You need to escape any char listed there if you want the regular char and not the special meaning.

As a maybe simpler solution, you can put the template between \Q and \E - everything between them is considered as escaped.

Solution 2

  • Java characters that have to be escaped in regular expressions are:
    \.[]{}()<>*+-=!?^$|
  • Two of the closing brackets (] and }) are only need to be escaped after opening the same type of bracket.
  • In []-brackets some characters (like + and -) do sometimes work without escape.

Solution 3

To escape you could just use this from Java 1.5:

Pattern.quote("$test");

You will match exacty the word $test

Solution 4

According to the String Literals / Metacharacters documentation page, they are:

<([{\^-=$!|]})?*+.>

Also it would be cool to have that list refereed somewhere in code, but I don't know where that could be...

Solution 5

Combining what everyone said, I propose the following, to keep the list of characters special to RegExp clearly listed in their own String, and to avoid having to try to visually parse thousands of "\\"'s. This seems to work pretty well for me:

final String regExSpecialChars = "<([{\\^-=$!|]})?*+.>";
final String regExSpecialCharsRE = regExSpecialChars.replaceAll( ".", "\\\\$0");
final Pattern reCharsREP = Pattern.compile( "[" + regExSpecialCharsRE + "]");
String quoteRegExSpecialChars( String s)
{
    Matcher m = reCharsREP.matcher( s);
    return m.replaceAll( "\\\\$0");
}
Share:
375,890
Author by

Avinash Nair

Updated on August 24, 2020

Comments

  • Avinash Nair about 2 years

    I am trying to create an application that matches a message template with a message that a user is trying to send. I am using Java regex for matching the message. The template/message may contain special characters.

    How would I get the complete list of special characters that need to be escaped in order for my regex to work and match in the maximum possible cases?

    Is there a universal solution for escaping all special characters in Java regex?

  • mkdev
    mkdev about 9 years
    If you find \Q and \E hard to remember you can use instead Pattern.quote("...")
  • Aleksandr Dubinsky
    Aleksandr Dubinsky over 8 years
    I wish you'd actually stated them
  • Sorin over 8 years
    Why, @AleksandrDubinsky ?
  • Aleksandr Dubinsky
    Aleksandr Dubinsky over 8 years
    @Sorin Because it is the spirit (nay, policy?) of Stack Exchange to state the answer in your answer rather than just linking to an off-site resource. Besides, that page doesn't have a clear list either. A list can be found here: docs.oracle.com/javase/tutorial/essential/regex/literals.htm‌​l, yet it states "In certain situations the special characters listed above will not be treated as metacharacters," without explaining what will happen if one tries to escape them. In short, this question deserves a good answer.
  • fracz
    fracz about 8 years
    String escaped = regexString.replaceAll("([\\\\\\.\\[\\{\\(\\*\\+\\?\\^\\$\\|‌​])", "\\\\$1");
  • nhahtdh
    nhahtdh over 7 years
    ) also has to be escaped, and depending on whether you are inside or outside of a character class, there can be more characters to escape, in which case Pattern.quote does quite a good job at escaping a string for use both inside and outside of character class.
  • Dominika over 6 years
    Is there any way to not escape but allow those characters?
  • marbel82
    marbel82 over 6 years
    String escaped = tnk.replaceAll("[\\<\\(\\[\\{\\\\\\^\\-\\=\\$\\!\\|\\]\\}\\)‌​\\?\\*\\+\\.\\>]", "\\\\$0");
  • Tobi G.
    Tobi G. over 6 years
    Escaping a character means to allow the character instead of interpreting it as an operator.
  • Kenston Choi
    Kenston Choi about 6 years
    Unescaped - within [] may not always work since it is used to define ranges. It's safer to escape it. For example, the patterns [-] and [-)] match the string - but not with [(-)].
  • Sasha
    Sasha about 6 years
    "everything between them [\Q and \E] is considered as escaped" — except other \Q's and \E's (which potentially may occur within original regex). So, it's better to use Pattern.quote as suggested here and not to reinvent the wheel.
  • Joe Bowbeer
    Joe Bowbeer over 5 years
    The Pattern javadoc says it is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct, but a backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct. Therefore a much simpler regex will suffice: s.replaceAll("[\\W]", "\\\\$0") where \W designates non-word characters.
  • Old Nick
    Old Nick almost 4 years
    Even though the accepted answer does answer the question, this answer was more helpful to me when I was just looking for a quick list.
  • Volksman
    Volksman about 3 years
    Why is this not the most highly rated answer? It solves the problem without going into the complex details of listing all characters that needs escaping and it's part of the JDK - no need to write any extra code! Simple!
  • Hawk
    Hawk over 2 years
    -=! do not necessarily need to be escaped, it depends on the context. For example as a single letter they work as a constant regex.
  • aeskreis
    aeskreis 11 months
    Saved me some time, thank you!
  • Asher A 10 months
    What if a regex contains \E? how can it be escaped? e.g: "\\Q\\Eeee\\E" throws a java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 4