Is a Java string really immutable?
Solution 1
String
is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1
and s2
change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3
does not was actually a bit surprising to me, as I thought it would share the value
array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String
, we can see that the value
character array for a substring is actually copied (using Arrays.copyOfRange(..)
). This is why it goes unchanged.
You can install a SecurityManager
, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that String
s aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String
, where the value
array is indeed marked private final
. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)
Solution 2
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring()
which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3
is created with a new string due to substring()
it would not change.
Solution 3
You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void
object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
Solution 4
You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.
Solution 5
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for String
s via reflection.
The second effect you see is that all String
s change while it looks like you only change s1
. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new
it will not be interned automatically and you will not see this effect.
#substring
until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring
also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring
.
Comments
-
Darshan Patel about 3 years
We all know that
String
is immutable in Java, but check the following code:String s1 = "Hello World"; String s2 = "Hello World"; String s3 = s1.substring(6); System.out.println(s1); // Hello World System.out.println(s2); // Hello World System.out.println(s3); // World Field field = String.class.getDeclaredField("value"); field.setAccessible(true); char[] value = (char[])field.get(s1); value[6] = 'J'; value[7] = 'a'; value[8] = 'v'; value[9] = 'a'; value[10] = '!'; System.out.println(s1); // Hello Java! System.out.println(s2); // Hello Java! System.out.println(s3); // World
Why does this program operate like this? And why is the value of
s1
ands2
changed, but nots3
? -
Clockwork-Muse over 10 yearsNote that some aspects of this (the interning, sharing/not sharing of backing arrays) are essentially dependent on the VM and should not be relied upon for anything other than what it's used for: to cut down on memory used/as an optimization. Hoping for other, undocumented, functionality (ie comparing two strings with
==
) is going to get you in trouble. -
Harald K over 10 yearsActually,
String
interning is part of the JLS ("a string literal always refers to the same instance of class String"). But I agree, it's not good practice to count on the implementation details of theString
class. -
sleske over 10 yearsActually, visibility modifiers are (or at least were) intended as protection againts malicious code - however, you need to set a SecurityManager (System.setSecurityManager() ) to activate the protection. How secure this actually is is another question...
-
Tom W over 10 yearsDeserves an upvote because you emphasise that access modifiers are not intended to 'protect' code. This seems to be widely misunderstood in both Java and .NET. Although the previous comment does contradict that; I don't know much about Java, but in .NET this is certainly true. In neither language should users assume this makes their code hack-proof.
-
ntoskrnl over 10 yearsIt's not possible to violate the contract of
final
even through reflection. Also, as mentioned in another answer, since Java 7u6,#substring
doesn't share arrays. -
Sam Harwell over 10 years-1 for saying "String isn't immutable." I feel this is a misrepresentation of the situation, especially in comparison to the descriptions provided in some of the other answers to this question.
-
Harald K over 10 yearsIf the OP was using an older Sun/Oracle JRE, the last statement would print "Java!" (as he accidentally posted). This only affect the sharing of the value array between strings and sub strings. You still can't change the value without tricks, like reflection.
-
Jeppe Stig Nielsen over 10 yearsMaybe the reason why
substring
copies rather than using a "section" of the existing array, is otherwise if I had a huge strings
and took out a tiny substring calledt
from it, and I later abandoneds
but keptt
, then the huge array would be kept alive (not garbage collected). So maybe it is more natural for each string value to have its own associated array? -
Aleksandr Dubinsky over 10 years@JeppeStigNielsen Yes, that's the reason. It happens often in web programming, although people used the workaround
new String(a.substring(..))
. -
Harald K over 10 years@Jeppe, Alexandr The downside is that
substring
used to be an almost "free" operation due to the sharing, and now it's potentially costly. :-/ -
Yuriy Kulikov over 10 yearsHey haraldk, could you please post a link if you have any regarding the mentioned "change the lookup table used in Integer autoboxing"? Thanks a lot in advance.
-
Harald K over 10 years@YuriyKulikov: I first heard about it in a talk at JavaZone, I think it was by Heinz Kabutz. But it's the same technique described by Richard Tingle in the comments section of the question.
-
Harald K over 10 yearsActually, the behavior of
final
has changed over time... :-O According the "Reflection Madness" talk by Heinz I posted in the other thread,final
meant final in JDK 1.1, 1.3 and 1.4, but could be modified using reflection using 1.2 always, and in 1.5 and 6 in most cases... -
Holger over 10 yearsSharing arrays between a string and its substrings also implied that every
String
instance had to carry variables for remembering the offset into the referred array and length. That’s an overhead not to ignore given the total number of strings and the typical ratio between normal strings and substrings in an application. Since they had to get evaluated for every string operation it meant slowing down every string operation just for the benefit of just one operation, a cheap substring. -
Holger over 10 years
final
fields can be changed throughnative
code as done by the Serialization framework when reading the fields of a serialized instance as well asSystem.setOut(…)
which modifies the finalSystem.out
variable. The latter is the most interesting feature as reflection with access override cannot changestatic final
fields. -
SpacePrez over 10 yearsThis only works for literals and is a compile-time optimization.
-
Bohemian over 10 yearsIf you change the visibility of fields/methods it isn't useful because at compile time they are private
-
Hot Licks over 10 years@Holger - Yep, my understanding is that the offset field was dropped in recent JVMs. And even when it was present it was not used that often.
-
supercat over 10 years@Holger: If the definition of
String
is attached to a particular runtime and compatibility with reflection-based string manipulation code is not required, thenString
could be within the runtime implemented as an abstract class with derived typesSimpleString
,SubString
, etc. and a jinxedgetType()
which always returnedString
. Such a runtime could also do things like omit the cachedhashCode
when making strings less than 16 characters long, use aByte[]
rather thanChar[]
as the backing store for strings containing only ASCII, etc. -
supercat over 10 yearsReal ROM is just as immutable as a photographic print encased in plastic. The pattern is permanently set when the wafer (or print) is chemically developed. Electrically-alterable memories, including RAM chips, can behave as "true" ROM if the control signals necessary to write it cannot be energized without adding additional electrical connections to the circuit wherein it is installed. It's actually not uncommon for embedded devices to include RAM which is set at the factory and maintained by a back-up battery, and whose contents would need to be reloaded by the factory if the battey failed.
-
Holger over 10 years@supercat: the fact that
java.lang.String
isfinal
is part of the specification. There can’t be subclasses. The implementation could do lots of things via delegation internally but this would create an overhead in a very often used class. Most implementations rely on the inlining of mostString
methods. -
supercat over 10 years@Holger: The runtime cannot use subclasses which can recognized as such by running code, so if
s1
ands2
are strings, thens1.getClass()==s2.getClass()
must be true, but I don't think that means the runtime couldn't behind the scenes use multiple classes to hold "string" values *but jinxgetType()
and related methods so that all such objects would appear to be of typeString
. Does the standard mandate fields other thanlength
? -
Gray over 10 yearsYou can change the accessibility on methods but you can't change their public/private status and you can't make them be static.
-
Gray over 10 yearsHow did this answer add anything to the answers before you?
-
Paŭlo Ebermann over 10 yearsAlso note that this is a quite new behaviour, and not guaranteed by any spec.
-
Ted Pennings over 10 yearsIt might have been a thread-safety issue that was masked by slower execution time and less concurrency without JIT.
-
Eric Jablow over 10 yearsThe implementation of
String.substring(int, int)
changed with Java 7u6. Before 7u6, the JVM would just keep a pointer to the originalString
'schar[]
together with an index and length. After 7u6, it copies the substring into a newString
There are pros and cons. -
Andrey Chaschev over 10 years@TedPennings From my description it could, I just didn't want to go too much into details. I actually spent like a couple of days trying to localize it. It was a single-threaded algorithm which calculated a distance between two texts written in two different languages. I found two possible fixes for the issue - one was to turn off the JIT and the second one was to add literally no-op
String.format("")
inside one of the inner loops. There is a chance for it being some-other-then-JIT-failure issue, but I believe it was JIT, because this issue was never reproduced again after adding this no-op. -
Andrey Chaschev over 10 yearsI was doing this with an early version of JDK ~7u9, so it could be it.
-
Chris Hayes over 10 years@Zaphod42 Not true. You can also call
intern
manually on a non-literal String and reap the benefits. -
Holger over 10 years@supercat: the standard does not mandate any fields in the
String
class. And the most recent version of Oracle’s implementation have no length field. They simply use the array’s length. Using a kind-of internal subclassing still opens the performance issues I described. Having different implementation code hinders inlining and turns unconditional code into conditional. And it would make it impossible to implement the class(es) using plain standard Java code anymore. -
Holger over 10 years@Andrey Chaschev: “I found two possible fixes for the issue”… the third possible fix, not to hack into the
String
internals, did not come into your mind? -
Holger over 10 years@Ted Pennings: thread-safety issues and JIT issues are often the very same. The JIT is allowed to generate code which relies on the
final
field thread safety guarantees which break when modifying the data after object construction. So you can view it as a JIT issue or a MT issue just as you like. The real issue is to hack into theString
and modify data which are expected to be immutable. -
supercat over 10 years@Holger: My understanding is a packaged Java installation includes both the runtime and framework classes, so a version of
String
that ships with a particular runtime would not be expected to run with any other. If that is the case, I would think that speed loss incurred as a result of method dispatch could be offset by having manyString
natives written in optimized native code included within the runtime engine itself, which would be able to do things not possible within in Java [e.g.substring
could read and write strings 32 bits at a time]. Personally... -
supercat over 10 years...I would have liked to have seen
string
be a "primitive" which might hold achar[]
(which would be inaccessible to user code) but could hold anything the implementers saw fit. That would have allowed Java to enforce string immutability, allowed==
to compare string values, etc. It would also have allowed for the possibility of strings being kept in a separate heap with a GC that could accommodate shrinking objects [so that a 500-character substring of a 5,000 character string could use the storage from the original, but if all references to the original vanished... -
supercat over 10 years...only the portions to which references still existed would need to be retained]. It's all academic now, but if string logic is included in the same distributable as the JVM, I wouldn't see that there would be a problem with the JVM using some "magic" to improve string handling.
-
Holger over 10 years@supercat: it doesn’t matter whether you have native code or not, having different implementations for strings and substring within the same JVM or having
byte[]
strings for ASCII strings andchar[]
for others implies that every operation has to check which kind of string it is before operating. This hinders inlining of the code into the methods using strings which is the first step of further optimizations using the context information of the caller. This is a big impact. -
Holger over 10 years@supercat: Other things you mention are possible with an up-to-date JVM taking the
String
implementation as it is. E.g.char
array copying will be transformed into a 32 bit or even 64 bit at a time copying. And the gc can detect when achar[]
array is exclusively used byString
s and do some magic (though Oracle’s developers seem to have decided against such magic). Similar things apply to theString
equality. EqualString
s might have a different identity due to the specification, but their internal array might be the same, makingequals
a trivial method. -
cHao over 10 yearsNote, though: you want to use
intern
judiciously. Interning everything doesn't gain you much, and can be the source of some head-scratching moments when you add reflection to the mix. -
cHao over 10 years@supercat: Your computer is not one of those embedded systems, though. :) True hard-wired ROMs haven't been common in PCs for a decade or two; everything's EEPROM and flash these days. Basically every user-visible address that refers to memory, refers to potentially writable memory.
-
supercat over 10 years@cHao: Many flash chips allow portions to be write-protected in a fashion which, if it can be undone at all, would require applying different voltages than would be required for normal operation (which motherboards would not be equipped to do). I would expect motherboards to use that feature. Further, I'm not certain about today's computers, but historically some computers have had a region of RAM which was write-protected during the boot stage and could only be unprotected by a reset (which would force execution to start from ROM).
-
Scott Wisniewski over 10 years@supercat I think you are missing the point of the topic, which is that the strings, stored in RAM, aren't going to ever be truly immutable.
-
c0der over 4 years
Test1
andTest1
are inconsistent withtest1==test2
and do not follow java naming conventions. -
Bill K over 4 years@Holger the overhead of the start/length variables was not the problem, strangely it was an unexpected usage pattern. People would download an entire web page into a string then get a short sub-string out of that string and get rid of the original string assuming that it will be garbage collected, but unexpectedly their "substring" still had the entire text of the original string taking up memory. Doing this with a few dozen pages lead to instant and very confusing memory problems, so java was updated to adapt to this unexpected usage rather than force people to adapt to unexpected behavior.
-
Holger over 4 years@BillK since that was the behavior for two decades, it was well understood, documented and not very surprising. Compare with this question, which is about the surprise that this array sharing did not happen anymore. The behavior surely stayed the same if it wasn’t the case that this behavior favored a small corner case (fast substrings) to the disadvantage of the majority (yes that includes surprises about the memory consumption, but not only). Nowadays, we have String Deduplication, which wouldn’t work that smoothly, if it had to atomically update three fields together.
-
Bill K over 4 years@Holger It's funny how we developers tend to think that because we have been programming for a long time and know about bizarre behaviors, they aren't surprises. Surprises are, by definition, behaviors you had to learn about and were therefore documented, those are the most important to address.
-
Holger over 4 years@BillK you don't need to convince me that this behavior was a surprising behavior. I'm just saying that this was not the main motivation to change that. The JRE developers did not suddenly realize that this behavior is surprising. They already knew for twenty years. Actual performance studies with real life applications had a much higher weight.
-
deepakl.2000 about 2 years@c0der Can you write the entire java program with public static void main method to depic this
-
c0der about 2 years@deepakl.2000 run it online: online-ide.com/03crhm6aOu . You'll get a compilation error. The variable names are inconsistent.
-
deepakl.2000 about 2 years@c0der Where is the entire program which depicts Pointer[x] and Pointer[y]