What is an efficient way to replace many characters in a string?

20,515

Solution 1

You could create a table of String[] which is Character.MAX_VALUE in length. (Including the mapping to lower case)

As the replacements got more complex, the time to perform them would remain the same.

private static final String[] REPLACEMENT = new String[Character.MAX_VALUE+1];
static {
    for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++)
        REPLACEMENT[i] = Character.toString(Character.toLowerCase((char) i));
    // substitute
    REPLACEMENT['á'] =  "a";
    // remove
    REPLACEMENT['-'] =  "";
    // expand
    REPLACEMENT['æ'] = "ae";
}

public String convertWord(String word) {
    StringBuilder sb = new StringBuilder(word.length());
    for(int i=0;i<word.length();i++)
        sb.append(REPLACEMENT[word.charAt(i)]);
    return sb.toString();
} 

Solution 2

My suggestion would be:

  • Convert the String to a char[] array
  • Run through the array, testing each character one by one (e.g. with a switch statement) and replacing it if needed
  • Convert the char[] array back to a String

I think this is probably the fastest performance you will get in pure Java.

EDIT: I notice you are doing some changes that change the length of the string. In this case, the same principle applies, however you need to keep two arrays and increment both a source index and a destination index separately. You might also need to resize the destination array if you run out of target space (i.e. reallocate a larger array and arraycopy the existing destination array into it)

Solution 3

My implementation is based on look up table.

public static String convertWord(String str) {
    char[] words = str.toCharArray();
    char[] find = {'á','é','ú','ý','ð','ó','ö','æ','þ','-','.',
            '/'};
    String[] replace = {"a","e","u","y","d","o","o","ae","th"};
    StringBuilder out = new StringBuilder(str.length());
    for (int i = 0; i < words.length; i++) {
        boolean matchFailed = true;
        for(int w = 0; w < find.length; w++) {
            if(words[i] == find[w]) {
                if(w < replace.length) {
                    out.append(replace[w]);
                }
                matchFailed = false;
                break;
            }
        }
        if(matchFailed) out.append(words[i]);
    }
    return out.toString();
}

Solution 4

My first choice would be to use a StringBuilder because you need to remove some chars from the string.

Second choice would be to iterate throw the array of chars and add the treated char to another array of the inicial size of the string. Then you would need to copy the array to trim the possible unused positions.

After that, I would make some performance tests to see witch one is better.

Share:
20,515
Ólafur Waage
Author by

Ólafur Waage

"If you really want to understand something, the best way is to try and explain it to someone else. That forces you to sort it out in your mind. And the more slow and dim-witted your pupil, the more you have to break things down into more and more simple ideas. And that's really the essence of programming. By the time you've sorted out a complicated idea into little steps that even a stupid machine can deal with, you've learned something about it yourself." - Douglas Adams

Updated on July 26, 2020

Comments

  • Ólafur Waage
    Ólafur Waage almost 4 years

    String handling in Java is something I'm trying to learn to do well. Currently I want to take in a string and replace any characters I find.

    Here is my current inefficient (and kinda silly IMO) function. It was written to just work.

    public String convertWord(String word)
    {
        return word.toLowerCase().replace('á', 'a')
                                 .replace('é', 'e')
                                 .replace('í', 'i')
                                 .replace('ú', 'u')
                                 .replace('ý', 'y')
                                 .replace('ð', 'd')
                                 .replace('ó', 'o')
                                 .replace('ö', 'o')
                                 .replaceAll("[-]", "")
                                 .replaceAll("[.]", "")
                                 .replaceAll("[/]", "")
                                 .replaceAll("[æ]", "ae")
                                 .replaceAll("[þ]", "th");
    }
    

    I ran 1.000.000 runs of it and it took 8182ms. So how should I proceed in changing this function to make it more efficient?

    Solution found:

    Converting the function to this

    public String convertWord(String word)
    {
        StringBuilder sb = new StringBuilder();
    
        char[] charArr = word.toLowerCase().toCharArray();
    
        for(int i = 0; i < charArr.length; i++)
        {
            // Single character case
            if(charArr[i] == 'á')
            {
                sb.append('a');
            }
            // Char to two characters
            else if(charArr[i] == 'þ')
            {
                sb.append("th");
            }
            // Remove
            else if(charArr[i] == '-')
            {
            }
            // Base case
            else
            {   
                sb.append(word.charAt(i));
            }
        }
    
        return sb.toString();
    }
    

    Running this function 1.000.000 times takes 518ms. So I think that is efficient enough. Thanks for the help guys :)