Regex using Java String.replaceAll

76,741

Solution 1

If it is a function that continuously you are using, there is a problem. Each regular expression is compiled again for each call. It is best to create them as constants. You could have something like this.

private static final Pattern[] patterns = {
    Pattern.compile("</?i>"),
    Pattern.compile("//"),
    // Others
};

private static final String[] replacements = {
    "",
    "/",
    // Others
};

public static String cleanString(String str) {
    for (int i = 0; i < patterns.length; i++) {
        str = patterns[i].matcher(str).replaceAll(replacements[i]);
    }
    return str;
}

Solution 2

cleanInst.replaceAll("[<i>]", "");

should be:

cleanInst = cleanInst.replaceAll("[<i>]", "");

since String class is immutable and doesn't change its internal state, i.e. replaceAll() returns a new instance that's different from cleanInst.

Solution 3

You should read a basic regular expressions tutorial.

Until then, what you tried to do can be done like this:

cleanInst = cleanInst.replace("//", "/");
cleanInst = cleanInst.replaceAll("</?i>", "");
cleanInst = cleanInst.replaceAll("/n\\b", ";")
cleanInst = cleanInst.replaceAll("\\bPhysics Dept\\.", "Physics Department");
cleanInst = cleanInst.replaceAll("(?i)\\b(?:the )?dept\\b\\.?", "The Department");

You could probably chain all those replace operations (but I don't know the proper Java syntax for this).

About the word boundaries: \b usually only makes sense directly before or after an alphanumeric character.

For example, \b/n\b will only match /n if it's directly preceded by an alphanumeric character and followed by a non-alphanumeric character, so it matches "a/n!" but not "foo /n bar".

Share:
76,741
user2072797
Author by

user2072797

Updated on July 09, 2022

Comments

  • user2072797
    user2072797 almost 2 years

    I am looking to replace a java string value as follows. below code is not working.

            cleanInst.replaceAll("[<i>]", "");
            cleanInst.replaceAll("[</i>]", "");
            cleanInst.replaceAll("[//]", "/");
            cleanInst.replaceAll("[\bPhysics Dept.\b]", "Physics Department");
            cleanInst.replaceAll("[\b/n\b]", ";");
            cleanInst.replaceAll("[\bDEPT\b]", "The Department");
            cleanInst.replaceAll("[\bDEPT.\b]", "The Department");
            cleanInst.replaceAll("[\bThe Dept.\b]", "The Department");
            cleanInst.replaceAll("[\bthe dept.\b]", "The Department");
            cleanInst.replaceAll("[\bThe Dept\b]", "The Department");
            cleanInst.replaceAll("[\bthe dept\b]", "The Department");
            cleanInst.replaceAll("[\bDept.\b]", "The Department");
            cleanInst.replaceAll("[\bdept.\b]", "The Department");
            cleanInst.replaceAll("[\bdept\b]", "The Department");
    

    What is the easiest way to achieve the above replace?

    • stinepike
      stinepike about 11 years
      what do you mean by not working?
    • Reinstate Monica -- notmaynard
      Reinstate Monica -- notmaynard about 11 years
      Remove the square brackets ([ and ]). These are for character classes. If something else is not working, you'll need to be more specific.
    • fge
      fge about 11 years
      Are you aware of what a character class is in a regex? regex.info
    • SLaks
      SLaks about 11 years
      Strings are immutable.
    • Isaac
      Isaac about 11 years
      and Ignore Case modifier would work for a lot of the dept replaces
    • jahroy
      jahroy about 11 years
      As @SLaks has pointed out: Strings are immutable. Your code will do nothing if you don't store the return value of String.replaceAll() somewhere. Right now your code does nothing with the return value.
  • Bohemian
    Bohemian about 11 years
    +1 your answer is pretty good, but why the non-capturing group for "the "? Is it just "performance"? Cos IMHO readability drops more than performance increases. Btw I suspect /n is meant to be \n
  • Tim Pietzcker
    Tim Pietzcker about 11 years
    I'm just used to doing it like this. I never use capturing parentheses unless I want to capture a group. I agree that there's tension between stating one's intentions clearly and readability.
  • AxA
    AxA over 7 years
    Instead of Pattern, we now have Matcher objects created every time. How is this better?
  • Ade Miller
    Ade Miller over 6 years
    Because compiling a regex Pattern is more costly than creating a Matcher for a (pre-compiled) Pattern?