Is it safe to get values from a java.util.HashMap from multiple threads (no modification)?

77,556

Solution 1

Your idiom is safe if and only if the reference to the HashMap is safely published. Rather than anything relating the internals of HashMap itself, safe publication deals with how the constructing thread makes the reference to the map visible to other threads.

Basically, the only possible race here is between the construction of the HashMap and any reading threads that may access it before it is fully constructed. Most of the discussion is about what happens to the state of the map object, but this is irrelevant since you never modify it - so the only interesting part is how the HashMap reference is published.

For example, imagine you publish the map like this:

class SomeClass {
   public static HashMap<Object, Object> MAP;

   public synchronized static setMap(HashMap<Object, Object> m) {
     MAP = m;
   }
}

... and at some point setMap() is called with a map, and other threads are using SomeClass.MAP to access the map, and check for null like this:

HashMap<Object,Object> map = SomeClass.MAP;
if (map != null) {
  .. use the map
} else {
  .. some default behavior
}

This is not safe even though it probably appears as though it is. The problem is that there is no happens-before relationship between the set of SomeObject.MAP and the subsequent read on another thread, so the reading thread is free to see a partially constructed map. This can pretty much do anything and even in practice it does things like put the reading thread into an infinite loop.

To safely publish the map, you need to establish a happens-before relationship between the writing of the reference to the HashMap (i.e., the publication) and the subsequent readers of that reference (i.e., the consumption). Conveniently, there are only a few easy-to-remember ways to accomplish that[1]:

  1. Exchange the reference through a properly locked field (JLS 17.4.5)
  2. Use static initializer to do the initializing stores (JLS 12.4)
  3. Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
  4. Initialize the value into a final field (JLS 17.5).

The ones most interesting for your scenario are (2), (3) and (4). In particular, (3) applies directly to the code I have above: if you transform the declaration of MAP to:

public static volatile HashMap<Object, Object> MAP;

then everything is kosher: readers who see a non-null value necessarily have a happens-before relationship with the store to MAP and hence see all the stores associated with the map initialization.

The other methods change the semantics of your method, since both (2) (using the static initalizer) and (4) (using final) imply that you cannot set MAP dynamically at runtime. If you don't need to do that, then just declare MAP as a static final HashMap<> and you are guaranteed safe publication.

In practice, the rules are simple for safe access to "never-modified objects":

If you are publishing an object which is not inherently immutable (as in all fields declared final) and:

  • You already can create the object that will be assigned at the moment of declarationa: just use a final field (including static final for static members).
  • You want to assign the object later, after the reference is already visible: use a volatile fieldb.

That's it!

In practice, it is very efficient. The use of a static final field, for example, allows the JVM to assume the value is unchanged for the life of the program and optimize it heavily. The use of a final member field allows most architectures to read the field in a way equivalent to a normal field read and doesn't inhibit further optimizationsc.

Finally, the use of volatile does have some impact: no hardware barrier is needed on many architectures (such as x86, specifically those that don't allow reads to pass reads), but some optimization and reordering may not occur at compile time - but this effect is generally small. In exchange, you actually get more than what you asked for - not only can you safely publish one HashMap, you can store as many more not-modified HashMaps as you want to the same reference and be assured that all readers will see a safely published map.

For more gory details, refer to Shipilev or this FAQ by Manson and Goetz.


[1] Directly quoting from shipilev.


a That sounds complicated, but what I mean is that you can assign the reference at construction time - either at the declaration point or in the constructor (member fields) or static initializer (static fields).

b Optionally, you can use a synchronized method to get/set, or an AtomicReference or something, but we're talking about the minimum work you can do.

c Some architectures with very weak memory models (I'm looking at you, Alpha) may require some type of read barrier before a final read - but these are very rare today.

Solution 2

Jeremy Manson, the god when it comes to the Java Memory Model, has a three part blog on this topic - because in essence you are asking the question "Is it safe to access an immutable HashMap" - the answer to that is yes. But you must answer the predicate to that question which is - "Is my HashMap immutable". The answer might surprise you - Java has a relatively complicated set of rules to determine immutability.

For more info on the topic, read Jeremy's blog posts:

Part 1 on Immutability in Java: http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html

Part 2 on Immutability in Java: http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-2.html

Part 3 on Immutability in Java: http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-3.html

Solution 3

The reads are safe from a synchronization standpoint but not a memory standpoint. This is something that is widely misunderstood among Java developers including here on Stackoverflow. (Observe the rating of this answer for proof.)

If you have other threads running, they may not see an updated copy of the HashMap if there is no memory write out of the current thread. Memory writes occur through the use of the synchronized or volatile keywords, or through uses of some java concurrency constructs.

See Brian Goetz's article on the new Java Memory Model for details.

Solution 4

After a bit more looking, I found this in the java doc (emphasis mine):

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)

This seems to imply that it will be safe, assuming the converse of the statement there is true.

Solution 5

One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.

http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html

Share:
77,556

Related videos on Youtube

Dave L.
Author by

Dave L.

SOreadytohelp

Updated on November 20, 2020

Comments

  • Dave L.
    Dave L. over 3 years

    There is a case where a map will be constructed, and once it is initialized, it will never be modified again. It will however, be accessed (via get(key) only) from multiple threads. Is it safe to use a java.util.HashMap in this way?

    (Currently, I'm happily using a java.util.concurrent.ConcurrentHashMap, and have no measured need to improve performance, but am simply curious if a simple HashMap would suffice. Hence, this question is not "Which one should I use?" nor is it a performance question. Rather, the question is "Would it be safe?")

    • user963601
      user963601 over 15 years
      Many answers here are correct regarding mutual exclusion from running threads, but incorrect regarding memory updates. I've voted up/down accordingly, but there are still many incorrect answers with positive votes.
    • kaqqao
      kaqqao about 8 years
      @Heath Borders, if the instance a was statically initialized unmodifiable HashMap, it should be safe for concurrent read (as other threads couldn't have missed updates as there were no updates), right?
    • user963601
      user963601 about 8 years
      If it's statically initialized and never modified outside of the static block, then it might be ok because all static initialization is synchronized by the ClassLoader. That's worth a separate question on its own. I'd still explicitly synchronize it and profile to verify that it was causing real performance issues.
    • BeeOnRope
      BeeOnRope over 7 years
      @HeathBorders - what do you mean by "memory updates"? The JVM is a formal model which defines things like visibility, atomicity, happens-before relationships, but doesn't use terms like "memory updates". You should clarify, preferably using terminology from the JLS.
    • BeeOnRope
      BeeOnRope over 7 years
      @Dave - I assume you aren't still looking for answer after 8 years, but for the record, the key confusion in nearly all the answers is that they focus on the actions you take on the map object. You've already explained that you never modify the object, so that is all irrelevant. The only potential "gotcha" then is how you publish the reference to the Map, which you didn't explain. If you don't do it safely, it is not safe. If you do it safely, it is. Details in my answer.
  • Dave L.
    Dave L. over 15 years
    Thanks for the warning, but there's no attempts to use null keys or values.
  • Dave L.
    Dave L. over 15 years
    I think there are classes where simply reading concurrently can get you into trouble, because of internal use of temporary instance variables, for example. So one probably needs to carefully examine source, more than a quick scan for locking / mutex code.
  • Dave L.
    Dave L. over 15 years
    It's a good point, but I'm relying on static initialization, during which no references escape, so it should be safe.
  • Alexander
    Alexander over 15 years
    Sorry for the dual submission Heath, I only noticed yours after I submitted mine. :)
  • user963601
    user963601 over 15 years
    I'm just glad there are other people here that actually understand the memory-effects.
  • Dave L.
    Dave L. over 15 years
    Indeed, though no thread will see the object before it is initialized properly, so I don't think that is a concern in this case.
  • Dave L.
    Dave L. over 15 years
    I'm not sure that thread accessing it will acquire any lock, but I am sure they won't get a reference to the object until after it has been initialized, so I don't think they can have a stale copy.
  • Alex Miller
    Alex Miller over 15 years
    While this is excellent advice, as other answers state, there is a more nuanced answer in the case of an immutable, safely published map instance. But you should do that only if You Know What You're Doing.
  • Dave L.
    Dave L. over 15 years
    Hopefully with questions like these more of us can Know What We're Doing.
  • Chris Vest
    Chris Vest over 15 years
    @Alex: The reference to the HashMap can be volatile to create the same memory visibility guarantees. @Dave: It is possible to see references to new objs before the work of its ctor becomes visible to your thread.
  • Dave L.
    Dave L. over 15 years
    @Christian In the general case, certainly. I was saying that in this code, it isn't.
  • Taylor Gautier
    Taylor Gautier about 15 years
    I don't agree with this answer. "Concurrent reads from a HashMap are safe" by itself is incorrect. It doesn't state whether the reads are occurring against a map that is mutable or immutable. To be correct it should read "Concurrent reads from an immutable HashMap are safe"
  • Steve Jessop
    Steve Jessop about 15 years
    Not according to the articles you yourself linked to: the requirement is that the map must not be changed (and previous changes must be visible to all the reader threads), not that it be immutable (which is a technical term in Java and is a sufficient but not necessary condition for safety).
  • Vishy
    Vishy about 15 years
    Actually I have seen this hang the JVM without consuming CPU (which is perhaps worse)
  • Bill Michell
    Bill Michell about 15 years
    That depends entirely on how the object is initialised.
  • Dave L.
    Dave L. over 14 years
    This is "safe" in that it enforces the immutability, but it doesn't address the thread safety issue. If the map is safe to access with the UnmodifiableMap wrapper then it is safe without it, and vice versa.
  • Alex Miller
    Alex Miller over 12 years
    I think this code has been rewritten such that it's no longer possible to get the infinite loop. But you still shouldn't be hitting getting and putting from an unsynchronized HashMap for other reasons.
  • Binita Bharati
    Binita Bharati over 8 years
    The question says that once the HashMap has been initialised, he doesn't intend to update it any further. From, then on,he just wants to use it as a read-only data structure. I think, it would be safe to do so, provided, the data stored in his Map is immutable.
  • user963601
    user963601 over 8 years
    What are you basing this on? Did you read the links I provided in my answer?
  • shmosel
    shmosel over 8 years
    @AlexMiller even aside from the other reasons (I assume you're referring to safe publishing), I don't think an implementation change should be a reason to loosen access restrictions, unless it's explicitly allowed for by the documentation. As it happens, the HashMap Javadoc for Java 8 still contains this warning: Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
  • Pierre
    Pierre about 8 years
    Acquiring a RANDOM lock doesn't garantee the whole thread cpu cache to be cleared. It depends on the JVM implementation, and it's very likely not done this way.
  • Konstantin Milyutin
    Konstantin Milyutin about 8 years
    I agree with Pierre, I don't think that acquiring any lock will be enough. You have to synchronize on the same lock for the changes to become visible.
  • Snicolas
    Snicolas about 8 years
    This answer is low quality, it is the same as the answer from @taylor gauthier but with less details.
  • Ajax
    Ajax almost 8 years
    Ummmm... not to be an ass, but you have it backwards. Taylor said "no, go look at this blog post, the answer might surprise you", whereas this answer actually adds something new that I didn't know... About a happens-before relationship of the write of a final field in a constructor. This answer is excellent, and I'm glad I read it.
  • Ajax
    Ajax almost 8 years
    Also a note... initializing a class implicitly synchronizes on the same lock (yes, you can deadlock in static field initializers), so if your initialization happens statically, it would be impossible for anyone else to see it before initialization is complete, as they would have to be blocked in the ClassLoader.loadClass method on the same lock acquired... And if you are wondering about different classloaders having different copies of the same field, you would be correct... but that would be orthogonal to the notion of race conditions; static fields of a classloader share a memory fence.
  • BeeOnRope
    BeeOnRope over 7 years
    Huh? This is the only correct answer I found after scrolling through the higher rated answers. The key is safely published and this is the only answer that even mentions it.
  • BeeOnRope
    BeeOnRope over 7 years
    I fail to see how this is a highly rated answer (or even an answer). It doesn't, for one, even answer the question and it doesn't mention the one key principle that will decide whether it is safe or not: safe publication. The "answer" boils down to "it's trickly" and here are three (complex) links you can read.
  • BeeOnRope
    BeeOnRope over 7 years
    To be clear, there are no such terms as "memory write outs" or similar things in the standard. The JLS describes an abstract model which is mostly based on the happens-before relationship, so a claim like "The reads are safe from a synchronization standpoint but not a memory standpoint." can't really be validated (and it isn't clear to me what you mean by it). The key is safe publication, which ensures that all other threads that see the reference will see the hash map fully constructed. Without that you are unsafe from any standpoint.
  • user963601
    user963601 over 6 years
    Yes, I'm saying that you need a happens-before relationship via a volatile member or final member within a constructor.
  • Jiang YD
    Jiang YD over 5 years
    never modify HashMap not mean the state of the map object is thread safe I think. God knows the library implementation, if the official document not said it is thread safe.
  • BeeOnRope
    BeeOnRope over 5 years
    @JiangYD - you are right there is a grey area there in some cases: when we say "modify" what we really mean is any action that internally performs some writes that might race with reads or writes on other threads. These writes might be internal implementation details, so even an operation that seems "read only" like get() might in fact perform some writes, say updating some statistics (or in the case of access-ordered LinkedHashMap updating the access order). So a well written class should provide some documentation which makes it clear if ...
  • BeeOnRope
    BeeOnRope over 5 years
    ... apparently "read only" operations are really internally read-only in the thread-safety sense. In C++ standard library for example, there is a blanket rule that member function marked const are truly read-only in such a sense (internally, they may still perform writes, but these will have to be made thread-safe). There is no const keyword in Java and I'm not aware of any documented blanket guarantee, but in general the standard library classes behave as expected, and exceptions are documented (see the LinkedHashMap example where RO ops like get are explicitly mentioned as unsafe).
  • BeeOnRope
    BeeOnRope over 5 years
    @JiangYD - finally, getting back to your original question, for HashMap we actually have right in the documentation the thread-safety behavior for this class: If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
  • BeeOnRope
    BeeOnRope over 5 years
    So for HashMap methods that we expect to be read-only, are read-only, as they do not structurally modify the HashMap. Of course, this guarantee might not hold for arbitrary other Map implementations, but the question is about HashMap specifically.
  • Jiang YD
    Jiang YD over 5 years
    @BeeOnRope- Make scenes
  • Jose Quijada
    Jose Quijada about 5 years
    @BeeOnRope Does construction of the HashMap before invoking SomeClass.setMap onto the now volatile SomeClass.MAP field need to be synchronized in some way to ensure there's no reordering of instructions with respect to the act of storing the newly constructed HashMap onto volatile SomeClass.MAP? This article suggests that the 'HashMap` 'put`'s need to be synchronized to ensure that other threads see the internals of the map in a consistent state, in addition to using volatile.
  • BeeOnRope
    BeeOnRope about 5 years
    @Jose - no, because a write to a volatile field synchronizes with the subsequent read. All actions prior to the write on the same thread happen-before the write, and the read happens-before all use on the consuming thread, so all the construction actions will be ordered before consumption by the accesses to the volatile field.
  • markspace
    markspace about 5 years
    This isn't really correct. As the other answers state, there must be a happens-before between the last modification and all subsequent "thread safe" reads. Normally this means you must safely publish the object after it has been created and its modifications are made. See the first, marked correct answer.
  • Jesse
    Jesse about 5 years
    He does answer the question at the very end of the first sentence. In terms of being an answer, he's raising the point that immutability (alluded to in the first paragraph of the question) is not straightforward, along with valuable resources that explain that topic further. The points don't measure whether it's an answer, it measures whether the answer was "useful" to others. The answer being accepted means it was the answer the OP was looking for, which your answer received.
  • BeeOnRope
    BeeOnRope almost 5 years
    @Jesse he is not answering the question at the end of the first sentence, he's answerig the question "is it safe to access an immutable object", which may or may not apply to the OP's question as he points out in the next sentence. Essentially this is a almost a link-only "go figure it yourself" type answer, which is not a good answer for SO. As for the upvotes, I think it's more a function of being 10.5 years old and a frequently searched topic. It has received only a very few net upvotes in the last several years so maybe people are coming around :).