Java remove all non alphanumeric character from beginning and end of string

15,683

Solution 1

Use ^ (matches at the beginning of the string) and $ (matches at the end) anchors:

s = s.replaceAll("^[^a-zA-Z0-9\\s]+|[^a-zA-Z0-9\\s]+$", "");

Solution 2

Use:

s.replaceAll("^[^\\p{L}^\\p{N}\\s%]+|[^\\p{L}^\\p{N}\\s%]+$", "")

Instead of:

s.replaceAll("^[^a-zA-Z0-9\\s]+|[^a-zA-Z0-9\\s]+$", "")

Where p{L} is any kind of letter from any language.
And p{N}is any kind of numeric character in any script.
For use in Latin-based scripts, when non-English languages are needed, like Spanish, for instance: éstas, apuntó; will in the latter become; stas and apunt. The former also works on non-Latin based languages.
For all Indo-European Languages, add p{Mn} for Arabic and Hebrew vowels:

s.replaceAll("^[^\\p{L}^\\p{N}^\\p{Mn}\\s%]+|[^\\p{L}^\\p{N}^\\p{Mn}\\s%]+$", "")

For Dravidian languages, the vowels may surround the consonant - as opposed to Semitic languages where they are "within" the character - like ಾ. For this use p{Me} instead. For all languages use:

s.replaceAll("^[^\\p{L}^\\p{N}^\\p{M}\\s%]+|[^\\p{L}^\\p{N}^\\p{M}\\s%]+$", "")

See regex tutorial for a list of Unicode categories

Share:
15,683
Mike6679
Author by

Mike6679

Updated on June 08, 2022

Comments

  • Mike6679
    Mike6679 almost 2 years

    I know how to replace ALL non alphanumeric chars in a string but how to do it from just beginning and end of the string?

    I need this string:

    "theString,"

    to be:

    theString

    replace ALL non alphanumeric chars in a string:

    s = s.replaceAll("[^a-zA-Z0-9\\s]", "");
    
  • David Conrad
    David Conrad almost 10 years
    What's that \\s doing in there? I know OP had it, but it was wrong then and it's wrong now.
  • falsetru
    falsetru almost 10 years
    @DavidConrad, \\s will match any whitespace character. I thought it was OP's intention to exclude alpha-numeric characters and space characters, so I didn't touch it.
  • David Conrad
    David Conrad almost 10 years
    Exactly, that's why it's wrong. OP said "replace ALL non alphanumeric chars in a string". It's a negated set, so it will replace anything EXCEPT a-z, A-Z, 0-9, and any whitespace character. So it will leave in whitespace.
  • David Conrad
    David Conrad almost 10 years
    I think OP was trying to match UP TO a space, and didn't get how sets work. I guess I could be wrong.
  • Mike6679
    Mike6679 almost 10 years
    @falsetru does this strip ALL non alphanumeric from beginning and end of the string or just one in beginning and end?
  • falsetru
    falsetru almost 10 years
    @Mike, It removes all non alphanumeric + non whitespace from the beginning and the end of the string. (I used +). If you want to remove only one, remove +.
  • Mike6679
    Mike6679 almost 10 years
    Just a note: Although this did work, I had to replace with my own parser because the regex expression was just too expensive over thousands of iterations.
  • O. Jones
    O. Jones over 9 years
    This removes all the non-alphanumeric characters
  • borchvm
    borchvm over 4 years
    Code-only answers are considered low quality: make sure to provide an explanation what your code does and how it solves the problem. It will help the asker and future readers both if you can add more information in your post. See also Explaining entirely code-based answers: meta.stackexchange.com/questions/114762/…