Should you check if the map containsKey before using ConcurrentMap's putIfAbsent
Solution 1
Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent
default method to ConcurrentMap
(and Map
which is kind of interesting because that implementation would be wrong for ConcurrentMap
). (And 1.7 added the "diamond operator" <>
.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSet
s contained in the ConcurrentMap
.)
Solution 2
Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.
Solution 3
You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>)
from Eclipse Collections (formerly GS Collections).
The advantage over calling get()
, doing a null check, and then calling putIfAbsent()
is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap
, the implementation of getIfAbsentPut()
is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap
is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>)
. The Javadoc for this method states:
The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.
Solution 4
By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.
Solution 5
In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
@Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}
Comments
-
Chris Dail almost 2 years
I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>(); // ... map.putIfAbsent(name, new HashSet<X>()); map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) { map.putIfAbsent(name, new HashSet<X>()); } map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?
-
Markus Kull over 13 years+1 for "concurrency is hard" and using the returnvalue of putIfAbsent
-
Drupad Panchal almost 13 years@Markus - +1 to you as well for pointing out the obvious, yet easy to ignore fact that reusing the returnvalue is a good practice.
-
Matt Friedman almost 12 yearsI just tried this and it is a great solution. You get all the benefits of ConcurrentMap without having to worry about the putIfAbsent idioms, which are easy to mess up.
-
Mansoor Siddiqui over 11 yearsGood answer. Reminds me of double-checked locking: en.wikipedia.org/wiki/Double-checked_locking
-
Stu Thompson over 11 yearsI've been back to this answer more than once. :P
-
zerkms about 11 yearsWouldn't it still instantiate several instances of
HashSet
in case if several threads simultaneously failed theif (set == null)
check? -
Brett about 11 years@zerkms That can happen, and the second
if (set == null)
handles that with the first thread through winning. The situation is unlikely, and the extra allocation time is unlikely to be significant. You could insert a temporary value in order to give a single thread exclusive access, but that's likely to make general performance and reliability worse. -
petersaints almost 11 years@TomHawtin-tackline But wouldn't the suggestion of using contains and get in the first question equivalent?
-
Chris Dail almost 10 yearsI like this a lot. I asked this question years ago but with Java 8 this is a really nice solution.
-
Nathan about 8 yearsThis is 'thread-shared'. The ThreadLocal is to prevent unnecessary object creation (Map, and consequently Set creation is not cheap) during the "putIfAbsent()" call. The set is correctly and safely published to all threads.
-
Nathan about 8 yearsAnd when combined with
putIfAbsent()
, only one will win. Thusdata
will always be the same across all threads. -
karmakaze over 7 yearsAs noted in the update to the accepted answer, Java 1.8 adds computeIfAbsent which achieves the same result and is far simpler.
-
Holger about 7 years@petersaints: there is no sense in calling
containsKey
, if you are callingget
afterwards anyway (ifcontainsKey
returnedtrue
). Even worse, ifcontainsKey
returnstrue
you still have no guaranty that the subsequentget
will return a non-null
value, as a concurrent update can change the situation in-between these two calls (that’s called the check-then-act anti-pattern). In principle, you can do the entire operation using a singleputIfAbsent
operation, but sinceget
is known to be substantially faster, tryingget
first has a point. The unusedHashSet
is irrelevant. -
Brett about 7 yearsThe context this code needs to be used in is going to be tricky. It will cause an "interesting" to find bug down the road, if not straight off. I'm not convinced that it's a performance win either. (Also you need to lock access to the
HashSet
.) -
Brett about 7 yearsI don't think the performance of this will be great. You have to go through a
ThreadLocal
in addition every time you just needed aConcurrentMap.get
. (And you access of theHashSet
is not threadsafe.) -
Nathan about 7 yearsIt hits the lock on the concurrent map, so according to JSL Chapter 17.4 and the general behavior of locks, (edit) it is visible, but not threadsafe. Thank you for pointing that out, I've updated the answer. You are right about the performance ... however if one is wanting performance, one would not use a concurrent map nor the (default) thread local, but instead very different data structures; which is way, way beyond the scope of this question.
-
Nathan about 7 yearsI should also point out that the performance hit on ThreadLocal is way less than the performance hit on creating and instantiating new objects, specifically a Set...
-
Đỗ Công Bằng over 4 yearsWhat is the purpose of the second check
if (set == null) { set = value;}
please? We assign the value to set but does it present in the map? -
Brett over 4 years@CongBangDO
putIfAbsent
will returnnull
(assigned toset
) if the map now usesvalue
. If thename
key was already present, thenputIfAbsent
will return the existing retained value.