How does Java store Strings and how does substring work internally?

17,369

Solution 1

See the comments:

    String str = "abcd";  // new String LITERAL which is interned in the pool
    String str1 = new String("abcd"); // new String, not interned: str1 != str
    String str2 = str.substring(0,2); // new String which is a view on str
    String str3 = str.substring(0,2); // same: str3 != str2
    String str7 = str1.substring(0,str1.length()); // special case: str1 is returned

Notes:

  • Since Java 7u6, substring returns a new string instead of a view on the original string (but that does not make a difference for that example)
  • Special case when you call str1.substring(0,str1.length()); - see code:

    public String substring(int beginIndex, int endIndex) {
        //some exception checking then
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }
    

EDIT

What is a view?

Until Java 7u6, a String is basically a char[] that contains the characters of the string with an offset and a count (i.e. the string is composed of count characters starting from the offset position in the char[]).

When calling substring, a new string is created with the same char[] but a different offset / count, to effectively create a view on the original string. (Except when count = length and offset = 0 as explained above).

Since java 7u6, a new char[] is created every time, because there is no more count or offset field in the string class.

Where is the common pool stored exactly?

This is implementation specific. The location of the pool has actually moved in recent versions. In more recent versions, it is stored on the heap.

How is the pool managed?

Main characteristics:

  • String literals are stored in the pool
  • Interned strings are stored in the pool (new String("abc").intern();)
  • When a string S is interned (because it is a literal or because intern() is called), the JVM will return a reference to a string in the pool if there is one that is equals to S (hence "abc" == "abc" should always return true).
  • Strings in the pool can be garbage collected (meaning that an interned string might be removed from the pool at some stage if it becomes full)

Solution 2

String is immutable Object.

String#subString - creates a new String . Source

In code it is [open jdk 6] -

 public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}
Share:
17,369
Bruce
Author by

Bruce

Do it right the first time

Updated on June 20, 2022

Comments

  • Bruce
    Bruce almost 2 years
    class StringTesting {
        public static void main(String args[])
        {
            String str = "abcd";
            String str1 = new String("abcd");
            String str2 = str.substring(0,2);
            String str3 = str.substring(0,2);
            String str4 = str.substring(0,str.length());
            String str5 = str1.substring(0,2);
            String str6 = str1.substring(0,2);
            String str7 = str1.substring(0,str1.length());
    
            System.out.println(str2 == str3);
            System.out.println(str == str4);
            System.out.println(str5 == str6);
            System.out.println(str1 == str7);
        }
    }
    

    Here is the output I get on java 1.6.0_27:

    false
    true
    false
    true
    

    Can someone please explain the output. I know Java differentiates between String stored in heap and String stored in String "common pool" (which can be interned). Internally, how is their representation different. How does it change the substring algorithm. Kindly cite book/article/blogs etc. wherever appropriate.

    • Denys Séguret
      Denys Séguret over 11 years
      Those are many questions. And it won't even be the same answer for recent and old JDK.
    • Vishy
      Vishy over 11 years
      If you really want to know how this works, read the source. You will get no better reference. Note: this was changed in Java 7 update 6. ;)
    • Bruce
      Bruce over 11 years
      @dystroy: That is why I gave my Java version in the question. I don't mind an answer for either the old or recent JDK.
    • Bruce
      Bruce over 11 years
      @PeterLawrey: Can you please provide a link for it
    • Bruce
      Bruce over 11 years
      @dystroy: I know there are many parts to the question but I believe they are all related and equally important for understanding the output. Thanks a lot for the links
    • Vishy
      Vishy over 11 years
      It's in your JDK under src.zip (or you can google for it) and if you have a decent IDE you just click to source of String to see it.
    • Bruce
      Bruce over 11 years
      @dystroy: Can you please re-post your comment
    • Denys Séguret
      Denys Séguret over 11 years
      It was a link to a few related answers. Probably this one, this one and this one. I removed my comment because I felt it was easy to find those questions without it and I didn't want to flood your question.
    • Bruce
      Bruce over 11 years
      @dystroy: Thanks for re-posting
  • Vishy
    Vishy over 11 years
    substring never creates a new object in the string literal pool, but it might return the same object which is already in the string literal pool.
  • Subhrajyoti Majumder
    Subhrajyoti Majumder over 11 years
    Source - I checked the source link
  • assylias
    assylias over 11 years
    @Quoi The source shows that a new string is created, which is not in the string pool...
  • Subhrajyoti Majumder
    Subhrajyoti Majumder over 11 years
    @assylias - that is in java 7, I checked in openjdk 6. :)
  • assylias
    assylias over 11 years
    @Quoi in Java 6 a new String is created too - it's just that the underlying char[] is shared (which is not the case in more recent JDKs).
  • Subhrajyoti Majumder
    Subhrajyoti Majumder over 11 years
    yes thats r8. OP tested his code in java 1.6.0_27
  • Bruce
    Bruce over 11 years
    What is a view and where is the common pool stored exactly and how is it managed?
  • Bruce
    Bruce over 11 years
    Can you please provide some references for your answer.
  • assylias
    assylias over 11 years
    @Bruce What is a view? => it is implementation specific so you have to look at the code for Java 6 and compare with Java 7 (can't find a link to update 6 or more recent). You can also see this discussion.
  • assylias
    assylias over 11 years
    @Bruce String pool => oracle.com/technetwork/java/javase/… (search for intern): In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application.
  • assylias
    assylias over 11 years
    @Bruce "How the string pool is accessed at runtime" => docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-‌​3.4
  • Bruce
    Bruce over 11 years
    Thanks a lot for the links!
  • user207421
    user207421 almost 7 years
    Please provide a reference for 'Strings in the pool can be garbage collected (meaning that a string literal might be removed from the pool at some stage if it becomes full)'.
  • assylias
    assylias almost 7 years
    @EJP I meant "interned string", rather than "string literal". I suspect that a string literal is eligible for garbage collection if the class to which it belongs becomes eligible for GC (e.g. in case of dynamic class loading) but I don't have evidence of it.