Regex for special characters in java

87,507

Solution 1

This worked for me:

String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");

For this input string:

/-+!@#$%^&())";:[]{}\ |wetyk 678dfgh

It yielded this result:

+wetyk+678dfgh

Solution 2

replaceAll expects a regex:

public static final String specialChars2 = "[`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";

Solution 3

The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.

What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).

The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.

Example:

String sequence = "qwe 123 :@~ ";

String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");

String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");

System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');

This outputs:

without special chars: 'qwe 123  '
spaces as pluses: 'qwe+123++'

If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).

Solution 4

I had a similar problem to solve and I used following method:

text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");

Code with time bench marking

public static String cleanPunctuations(String text) {
    return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}

public static void test(String in){
    long t1 = System.currentTimeMillis();
    String out = cleanPunctuations(in);
    long t2 = System.currentTimeMillis();
    System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");

}

public static void main(String[] args) {
    String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
            "[`~!@#$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
    test(s1);
    String s2 = "\"Sample Text=\"  with - minimal \t punctuation's";
    test(s2);
}

Sample Output

In=My text with 212354 digits spaces and 
 newline     tab [`~!@#$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text="  with - minimal    punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
Share:
87,507
Housefly
Author by

Housefly

Updated on September 02, 2020

Comments

  • Housefly
    Housefly over 3 years
    public static final String specialChars1= "\\W\\S";
    String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
    
    public static final String specialChars2 = "`~!@#$%^&*()_+[]\\;\',./{}|:\"<>?";
    String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
    

    Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).

    My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :

    java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
    

    How can this be to achieved?. I have searched but could not find a perfect solution.