Java CharAt() and deleteCharAt() performance

40,828

Solution 1

For String, StringBuffer, and StringBuilder, charAt() is a constant-time operation.

For StringBuffer and StringBuilder, deleteCharAt() is a linear-time operation.

StringBuffer and StringBuilder have very similar performance characteristics. The primary difference is that the former is synchronized (so is thread-safe) while the latter is not.

Solution 2

Let us just look at the corresponding actual java implementation(only relevant code) for each of these methods in turn. That itself will answer about their efficiency.

String.charAt :

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

As we can see, it is just a single array access which is a constant time operation.

StringBuffer.charAt :

public synchronized char charAt(int index) {
  if ((index < 0) || (index >= count))
    throw new StringIndexOutOfBoundsException(index);
  return value[index];
}

Again, single array access, so a constant time operation.

StringBuilder.charAt :

public char charAt(int index) {
    if ((index < 0) || (index >= count))
        throw new StringIndexOutOfBoundsException(index);
    return value[index];
}

Again, single array access, so a constant time operation. Even though all these three methods look same, there are some minor differences. For example, only StringBuffer.charAt method is synchronized but not other methods. Similarly if check is slightly different for String.charAt (guess why?). Closer look at these method implementations itself give us other minor differences among them.

Now, let us look at deleteCharAt implementations.

String does not have deleteCharAt method. The reason might be it is an immutable object. So exposing an API which explicitly indicates that this method modifies the object is not probably a good idea.

Both StringBuffer and StringBuilder are subclasses of AbstractStringBuilder. The deleteCharAt method of these two classes is delegating the implementation to its parent class itself.

StringBuffer.deleteCharAt :

  public synchronized StringBuffer deleteCharAt(int index) {
        super.deleteCharAt(index);
        return this;
    }

StringBuilder.deleteCharAt :

 public StringBuilder deleteCharAt(int index) {
        super.deleteCharAt(index);
        return this;
    }

AbstractStringBuilder.deleteCharAt :

  public AbstractStringBuilder deleteCharAt(int index) {
        if ((index < 0) || (index >= count))
            throw new StringIndexOutOfBoundsException(index);
        System.arraycopy(value, index+1, value, index, count-index-1);
        count--;
        return this;
    }

A closer look at AbstractStringBuilder.deleteCharAt method reveals that it is actually calling System.arraycopy. This can be O(N) in worst case. So deleteChatAt method is O(N) time complexity.

Solution 3

The charAt method is O(1).

The deleteCharAt method on StringBuilder and StringBuffer is O(N) on average, assuming you are deleting a random character from an N character StringBuffer / StringBuilder. (It has to move, on average, half of the remaining characters to fill up the "hole" left by the deleted character. There is no amortization over multiple operations; see below.) However, if you delete the last character, the cost will be O(1).

There is no deleteCharAt method for String.


In theory, StringBuilder and StringBuffer could be optimized for the case where you are inserting or deleting multiple characters in a "pass" through the buffer. They could do this by maintaining an optional "gap" in the buffer, and moving characters across it. (IIRC, emacs implements its text buffers this way.) The problems with this approach are:

  • It requires more space, for the attributes that say where the gap is, and for the gap itself.
  • It makes the code a lot more complicated, and slows down other operations. For instance, charAt would have to compare the offset with the start and end points of the gap, and make the corresponding adjustments to the actual index value before fetching the character array element.
  • It is only going to help if the application does multiple inserts / deletes on the same buffer.

Not surprisingly, this "optimization" has not been implemented in the standard StringBuilder / StringBuffer classes. However, a custom CharSequence class could use this approach.

Solution 4

charAt is super fast (and can use intrinsics for String), it's a simple index into an array. deleteCharAt would require an arraycopy, thus deleting a char won't be fast.

Share:
40,828
Jimmar
Author by

Jimmar

Updated on November 16, 2021

Comments

  • Jimmar
    Jimmar over 2 years

    I've been wondering about the implementation of charAt function for String/StringBuilder/StringBuffer in java what is the complexity of that ? also what about the deleteCharAt() in StringBuffer/StringBuilder ?

  • bestsss
    bestsss almost 13 years
    For String, deleteCharAt() is an O(n) operation, where n is the size of the string String is immutable and cannot delete anything (no such function), deleteCharAt for StringBuffer/Builder it's an array code System.arraycopy(value, index+1, value, index, count-index-1); while memmove can be impl. with the help of the hardware it's still O(n) technically.
  • Matt Ball
    Matt Ball almost 13 years
    I'm an idiot for not realizing the non-existence of String#deleteCharAt(). Thanks.
  • Matt Ball
    Matt Ball over 7 years
    @sleeparrow Java isn't JS; internally Java uses UTF-16, not UTF-8. If you read the source code for the various methods, it's pretty clear that, for example, charAt() is a constant-time operation because it's simply indexing into a char array. String#charAt(int) and StringBuilder#charAt(int)
  • sleeparrow
    sleeparrow over 7 years
    ha, I didn't realize this was a Java question. excuse me.
  • Tofig Hasanov
    Tofig Hasanov about 4 years
    @MattBall But UTF-16 isn't constant size either, so although charAt will still work in constant time, for characters that need 32 bits, operation will only return half of the real character