how to remove special characters from string in a file using java

16,186

Solution 1

This should work, "if you're looking to retain only ASCII (0-127) characters in your string":

String str = "This is sample CCNA program. it contains CCNP™";
str = str.replaceAll("[^\\x00-\\x7f]+", "");

Solution 2

Do you want to remove all special characters from your strings? If so:

String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");

Please see Sean Patrick Floyd's answer to a possible duplicate question.

Solution 3

You can do it from a Unicode point of view:

String s = "This is sample CCNA program. it contains CCNP™. And it contains digits 123456789.";
String res = s.replaceAll("[^\\p{L}\\p{M}\\p{P}\\p{Nd}\\s]+", "");
System.out.println(res);

will print out:

This is sample CCNA program. it contains CCNP. And it contains digits 123456789.

\\p{...} is a Unicode property

\\p{L} matches all letters from all languages

\\p{M} a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).

\\p{P} any kind of punctuation character.

\\p{Nd} a digit zero through nine in any script except ideographic scripts.

So this regex will replace every character that is not a letter (also combined letters), a Punctuation, a digit or a withespace character (\\s).

Solution 4

 ^[\\u0000-\\u007F]*$

With this you allow only ASCCI characters, but you need to say us what is for you an special character.

Share:
16,186
user2609542
Author by

user2609542

Updated on June 04, 2022

Comments

  • user2609542
    user2609542 almost 2 years

    I have text file it contains following information.My task is to remove special symbols from that text file.My input file conatins

    This is sample CCNA program. it contains CCNP™.

    My required output string:

    This is sample CCNA program. it contains CCNP.
    

    how to do this please suggest me.

    thanks

  • tchrist
    tchrist almost 11 years
    That’s wrong. ASCII is code points 0–127, because 128–255 are not ASCII.