Is a Java string really immutable?

java string reflection immutability

56,654

Solution 1

String is immutable* but this only means you cannot change it using its public API.

What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.

Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).

The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.

You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).

*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.

As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.

It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)

Solution 2

In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:

String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true

That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.

When you access a string using reflection, you get the actual pointer:

Field field = String.class.getDeclaredField("value");
field.setAccessible(true);

So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.

Solution 3

You are using reflection to circumvent the immutability of String - it's a form of "attack".

There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".

There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).

Depending on the security manager, you may not be able to execute your code.

Solution 4

You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.

Solution 5

Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.

The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.

#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.

As of current Java and your current version of the question there is no strange behaviour of #substring.

View more solutions

56,654

Author by

Darshan Patel

Don't waste a good mistake... Learn from it.

Updated on March 10, 2021

Comments

Darshan Patel about 3 years

We all know that String is immutable in Java, but check the following code:

String s1 = "Hello World";  
String s2 = "Hello World";  
String s3 = s1.substring(6);  
System.out.println(s1); // Hello World  
System.out.println(s2); // Hello World  
System.out.println(s3); // World  

Field field = String.class.getDeclaredField("value");  
field.setAccessible(true);  
char[] value = (char[])field.get(s1);  
value[6] = 'J';  
value[7] = 'a';  
value[8] = 'v';  
value[9] = 'a';  
value[10] = '!';  

System.out.println(s1); // Hello Java!  
System.out.println(s2); // Hello Java!  
System.out.println(s3); // World

Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?

Clockwork-Muse over 10 years

Note that some aspects of this (the interning, sharing/not sharing of backing arrays) are essentially dependent on the VM and should not be relied upon for anything other than what it's used for: to cut down on memory used/as an optimization. Hoping for other, undocumented, functionality (ie comparing two strings with ==) is going to get you in trouble.
Harald K over 10 years

Actually, String interning is part of the JLS ("a string literal always refers to the same instance of class String"). But I agree, it's not good practice to count on the implementation details of the String class.
sleske over 10 years

Actually, visibility modifiers are (or at least were) intended as protection againts malicious code - however, you need to set a SecurityManager (System.setSecurityManager() ) to activate the protection. How secure this actually is is another question...
Tom W over 10 years

Deserves an upvote because you emphasise that access modifiers are not intended to 'protect' code. This seems to be widely misunderstood in both Java and .NET. Although the previous comment does contradict that; I don't know much about Java, but in .NET this is certainly true. In neither language should users assume this makes their code hack-proof.
ntoskrnl over 10 years

It's not possible to violate the contract of final even through reflection. Also, as mentioned in another answer, since Java 7u6, #substring doesn't share arrays.
Sam Harwell over 10 years

-1 for saying "String isn't immutable." I feel this is a misrepresentation of the situation, especially in comparison to the descriptions provided in some of the other answers to this question.
Harald K over 10 years

If the OP was using an older Sun/Oracle JRE, the last statement would print "Java!" (as he accidentally posted). This only affect the sharing of the value array between strings and sub strings. You still can't change the value without tricks, like reflection.
Jeppe Stig Nielsen over 10 years

Maybe the reason why substring copies rather than using a "section" of the existing array, is otherwise if I had a huge string s and took out a tiny substring called t from it, and I later abandoned s but kept t, then the huge array would be kept alive (not garbage collected). So maybe it is more natural for each string value to have its own associated array?
Aleksandr Dubinsky over 10 years

@JeppeStigNielsen Yes, that's the reason. It happens often in web programming, although people used the workaround new String(a.substring(..)).
Harald K over 10 years

@Jeppe, Alexandr The downside is that substring used to be an almost "free" operation due to the sharing, and now it's potentially costly. :-/
Yuriy Kulikov over 10 years

Hey haraldk, could you please post a link if you have any regarding the mentioned "change the lookup table used in Integer autoboxing"? Thanks a lot in advance.
Harald K over 10 years

@YuriyKulikov: I first heard about it in a talk at JavaZone, I think it was by Heinz Kabutz. But it's the same technique described by Richard Tingle in the comments section of the question.
Harald K over 10 years

Actually, the behavior of final has changed over time... :-O According the "Reflection Madness" talk by Heinz I posted in the other thread, final meant final in JDK 1.1, 1.3 and 1.4, but could be modified using reflection using 1.2 always, and in 1.5 and 6 in most cases...
Holger over 10 years

Sharing arrays between a string and its substrings also implied that every String instance had to carry variables for remembering the offset into the referred array and length. That’s an overhead not to ignore given the total number of strings and the typical ratio between normal strings and substrings in an application. Since they had to get evaluated for every string operation it meant slowing down every string operation just for the benefit of just one operation, a cheap substring.
Holger over 10 years

final fields can be changed through native code as done by the Serialization framework when reading the fields of a serialized instance as well as System.setOut(…) which modifies the final System.out variable. The latter is the most interesting feature as reflection with access override cannot change static final fields.
SpacePrez over 10 years

This only works for literals and is a compile-time optimization.
Bohemian over 10 years

If you change the visibility of fields/methods it isn't useful because at compile time they are private
Hot Licks over 10 years

@Holger - Yep, my understanding is that the offset field was dropped in recent JVMs. And even when it was present it was not used that often.
supercat over 10 years

@Holger: If the definition of String is attached to a particular runtime and compatibility with reflection-based string manipulation code is not required, then String could be within the runtime implemented as an abstract class with derived types SimpleString, SubString, etc. and a jinxed getType() which always returned String. Such a runtime could also do things like omit the cached hashCode when making strings less than 16 characters long, use a Byte[] rather than Char[] as the backing store for strings containing only ASCII, etc.
supercat over 10 years

Real ROM is just as immutable as a photographic print encased in plastic. The pattern is permanently set when the wafer (or print) is chemically developed. Electrically-alterable memories, including RAM chips, can behave as "true" ROM if the control signals necessary to write it cannot be energized without adding additional electrical connections to the circuit wherein it is installed. It's actually not uncommon for embedded devices to include RAM which is set at the factory and maintained by a back-up battery, and whose contents would need to be reloaded by the factory if the battey failed.
Holger over 10 years

@supercat: the fact that java.lang.String is final is part of the specification. There can’t be subclasses. The implementation could do lots of things via delegation internally but this would create an overhead in a very often used class. Most implementations rely on the inlining of most String methods.
supercat over 10 years

@Holger: The runtime cannot use subclasses which can recognized as such by running code, so if s1 and s2 are strings, then s1.getClass()==s2.getClass() must be true, but I don't think that means the runtime couldn't behind the scenes use multiple classes to hold "string" values *but jinx getType() and related methods so that all such objects would appear to be of type String. Does the standard mandate fields other than length?
Gray over 10 years

You can change the accessibility on methods but you can't change their public/private status and you can't make them be static.
Gray over 10 years

How did this answer add anything to the answers before you?
Paŭlo Ebermann over 10 years

Also note that this is a quite new behaviour, and not guaranteed by any spec.
Ted Pennings over 10 years

It might have been a thread-safety issue that was masked by slower execution time and less concurrency without JIT.
Eric Jablow over 10 years

The implementation of String.substring(int, int) changed with Java 7u6. Before 7u6, the JVM would just keep a pointer to the original String's char[] together with an index and length. After 7u6, it copies the substring into a new String There are pros and cons.
Andrey Chaschev over 10 years

@TedPennings From my description it could, I just didn't want to go too much into details. I actually spent like a couple of days trying to localize it. It was a single-threaded algorithm which calculated a distance between two texts written in two different languages. I found two possible fixes for the issue - one was to turn off the JIT and the second one was to add literally no-op String.format("") inside one of the inner loops. There is a chance for it being some-other-then-JIT-failure issue, but I believe it was JIT, because this issue was never reproduced again after adding this no-op.
Andrey Chaschev over 10 years

I was doing this with an early version of JDK ~7u9, so it could be it.
Chris Hayes over 10 years

@Zaphod42 Not true. You can also call intern manually on a non-literal String and reap the benefits.
Holger over 10 years

@supercat: the standard does not mandate any fields in the String class. And the most recent version of Oracle’s implementation have no length field. They simply use the array’s length. Using a kind-of internal subclassing still opens the performance issues I described. Having different implementation code hinders inlining and turns unconditional code into conditional. And it would make it impossible to implement the class(es) using plain standard Java code anymore.
Holger over 10 years

@Andrey Chaschev: “I found two possible fixes for the issue”… the third possible fix, not to hack into the String internals, did not come into your mind?
Holger over 10 years

@Ted Pennings: thread-safety issues and JIT issues are often the very same. The JIT is allowed to generate code which relies on the final field thread safety guarantees which break when modifying the data after object construction. So you can view it as a JIT issue or a MT issue just as you like. The real issue is to hack into the String and modify data which are expected to be immutable.
supercat over 10 years

@Holger: My understanding is a packaged Java installation includes both the runtime and framework classes, so a version of String that ships with a particular runtime would not be expected to run with any other. If that is the case, I would think that speed loss incurred as a result of method dispatch could be offset by having many String natives written in optimized native code included within the runtime engine itself, which would be able to do things not possible within in Java [e.g. substring could read and write strings 32 bits at a time]. Personally...
supercat over 10 years

...I would have liked to have seen string be a "primitive" which might hold a char[] (which would be inaccessible to user code) but could hold anything the implementers saw fit. That would have allowed Java to enforce string immutability, allowed == to compare string values, etc. It would also have allowed for the possibility of strings being kept in a separate heap with a GC that could accommodate shrinking objects [so that a 500-character substring of a 5,000 character string could use the storage from the original, but if all references to the original vanished...
supercat over 10 years

...only the portions to which references still existed would need to be retained]. It's all academic now, but if string logic is included in the same distributable as the JVM, I wouldn't see that there would be a problem with the JVM using some "magic" to improve string handling.
Holger over 10 years

@supercat: it doesn’t matter whether you have native code or not, having different implementations for strings and substring within the same JVM or having byte[] strings for ASCII strings and char[] for others implies that every operation has to check which kind of string it is before operating. This hinders inlining of the code into the methods using strings which is the first step of further optimizations using the context information of the caller. This is a big impact.
Holger over 10 years

@supercat: Other things you mention are possible with an up-to-date JVM taking the String implementation as it is. E.g. char array copying will be transformed into a 32 bit or even 64 bit at a time copying. And the gc can detect when a char[] array is exclusively used by Strings and do some magic (though Oracle’s developers seem to have decided against such magic). Similar things apply to the String equality. Equal Strings might have a different identity due to the specification, but their internal array might be the same, making equals a trivial method.
cHao over 10 years

Note, though: you want to use intern judiciously. Interning everything doesn't gain you much, and can be the source of some head-scratching moments when you add reflection to the mix.
cHao over 10 years

@supercat: Your computer is not one of those embedded systems, though. :) True hard-wired ROMs haven't been common in PCs for a decade or two; everything's EEPROM and flash these days. Basically every user-visible address that refers to memory, refers to potentially writable memory.
supercat over 10 years

@cHao: Many flash chips allow portions to be write-protected in a fashion which, if it can be undone at all, would require applying different voltages than would be required for normal operation (which motherboards would not be equipped to do). I would expect motherboards to use that feature. Further, I'm not certain about today's computers, but historically some computers have had a region of RAM which was write-protected during the boot stage and could only be unprotected by a reset (which would force execution to start from ROM).
Scott Wisniewski over 10 years

@supercat I think you are missing the point of the topic, which is that the strings, stored in RAM, aren't going to ever be truly immutable.
c0der over 4 years

Test1 and Test1 are inconsistent with test1==test2 and do not follow java naming conventions.
Bill K over 4 years

@Holger the overhead of the start/length variables was not the problem, strangely it was an unexpected usage pattern. People would download an entire web page into a string then get a short sub-string out of that string and get rid of the original string assuming that it will be garbage collected, but unexpectedly their "substring" still had the entire text of the original string taking up memory. Doing this with a few dozen pages lead to instant and very confusing memory problems, so java was updated to adapt to this unexpected usage rather than force people to adapt to unexpected behavior.
Holger over 4 years

@BillK since that was the behavior for two decades, it was well understood, documented and not very surprising. Compare with this question, which is about the surprise that this array sharing did not happen anymore. The behavior surely stayed the same if it wasn’t the case that this behavior favored a small corner case (fast substrings) to the disadvantage of the majority (yes that includes surprises about the memory consumption, but not only). Nowadays, we have String Deduplication, which wouldn’t work that smoothly, if it had to atomically update three fields together.
Bill K over 4 years

@Holger It's funny how we developers tend to think that because we have been programming for a long time and know about bizarre behaviors, they aren't surprises. Surprises are, by definition, behaviors you had to learn about and were therefore documented, those are the most important to address.
Holger over 4 years

@BillK you don't need to convince me that this behavior was a surprising behavior. I'm just saying that this was not the main motivation to change that. The JRE developers did not suddenly realize that this behavior is surprising. They already knew for twenty years. Actual performance studies with real life applications had a much higher weight.
deepakl.2000 about 2 years

@c0der Can you write the entire java program with public static void main method to depic this
c0der about 2 years

@deepakl.2000 run it online: online-ide.com/03crhm6aOu . You'll get a compilation error. The variable names are inconsistent.
deepakl.2000 about 2 years

@c0der Where is the entire program which depicts Pointer[x] and Pointer[y]