Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

java performance concurrency concurrenthashmap

45,280

Solution 1

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.

Set<X> set = map.get(name);
if (set == null) {
    final Set<X> value = new HashSet<X>();
    set = map.putIfAbsent(name, value);
    if (set == null) {
        set = value;
    }
}

(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)

Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)

Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());

(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Solution 2

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.

Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

Solution 3

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).

The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.

import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());

The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.

This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:

The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple...

Note: I am a committer for Eclipse Collections.

Solution 4

By keeping a pre-initialized value for each thread you can improve on the accepted answer:

Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
    set = initial;
    initial = new HashSet<X>();
}
set.add(Y);

I recently used this with AtomicInteger map values rather than Set.

Solution 5

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.

Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.

For example:

private final ThreadLocal<HashSet<X>> 
  threadCache = new ThreadLocal<HashSet<X>>() {
      @Override
      protected
      HashSet<X> initialValue() {
          return new HashSet<X>();
      }
  };


private final ConcurrentMap<String, Set<X>> 
  map = new ConcurrentHashMap<String, Set<X>>();

And the actual logic...

// minimize object creation during thread contention
final Set<X> cached = threadCache.get();

Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
    // reset the cached value in the ThreadLocal
    listCache.set(new HashSet<X>());
    data = cached;
}

// make sure that the access to the set is thread safe
synchronized(data) {
    data.add(object);
}

View more solutions

45,280

Author by

Chris Dail

Director, Architect and Pragmatic Programmer

Updated on July 05, 2022

Comments

Chris Dail almost 2 years
I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
```
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();

// ...

map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
```
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
```
if (!map.containsKey(name)) {
    map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
```
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.

What is the best practice for using a putIfAbsent in this manner?
Markus Kull over 13 years

+1 for "concurrency is hard" and using the returnvalue of putIfAbsent
Drupad Panchal almost 13 years

@Markus - +1 to you as well for pointing out the obvious, yet easy to ignore fact that reusing the returnvalue is a good practice.
Matt Friedman almost 12 years

I just tried this and it is a great solution. You get all the benefits of ConcurrentMap without having to worry about the putIfAbsent idioms, which are easy to mess up.
Mansoor Siddiqui over 11 years

Good answer. Reminds me of double-checked locking: en.wikipedia.org/wiki/Double-checked_locking
Stu Thompson over 11 years

I've been back to this answer more than once. :P
zerkms about 11 years

Wouldn't it still instantiate several instances of HashSet in case if several threads simultaneously failed the if (set == null) check?
Brett about 11 years

@zerkms That can happen, and the second if (set == null) handles that with the first thread through winning. The situation is unlikely, and the extra allocation time is unlikely to be significant. You could insert a temporary value in order to give a single thread exclusive access, but that's likely to make general performance and reliability worse.
petersaints almost 11 years

@TomHawtin-tackline But wouldn't the suggestion of using contains and get in the first question equivalent?
Chris Dail almost 10 years

I like this a lot. I asked this question years ago but with Java 8 this is a really nice solution.
Nathan about 8 years

This is 'thread-shared'. The ThreadLocal is to prevent unnecessary object creation (Map, and consequently Set creation is not cheap) during the "putIfAbsent()" call. The set is correctly and safely published to all threads.
Nathan about 8 years

And when combined with putIfAbsent(), only one will win. Thus data will always be the same across all threads.
karmakaze over 7 years

As noted in the update to the accepted answer, Java 1.8 adds computeIfAbsent which achieves the same result and is far simpler.
Holger about 7 years

@petersaints: there is no sense in calling containsKey, if you are calling get afterwards anyway (if containsKey returned true). Even worse, if containsKey returns true you still have no guaranty that the subsequent get will return a non-null value, as a concurrent update can change the situation in-between these two calls (that’s called the check-then-act anti-pattern). In principle, you can do the entire operation using a single putIfAbsent operation, but since get is known to be substantially faster, trying get first has a point. The unused HashSet is irrelevant.
Brett about 7 years

The context this code needs to be used in is going to be tricky. It will cause an "interesting" to find bug down the road, if not straight off. I'm not convinced that it's a performance win either. (Also you need to lock access to the HashSet.)
Brett about 7 years

I don't think the performance of this will be great. You have to go through a ThreadLocal in addition every time you just needed a ConcurrentMap.get. (And you access of the HashSet is not threadsafe.)
Nathan about 7 years

It hits the lock on the concurrent map, so according to JSL Chapter 17.4 and the general behavior of locks, (edit) it is visible, but not threadsafe. Thank you for pointing that out, I've updated the answer. You are right about the performance ... however if one is wanting performance, one would not use a concurrent map nor the (default) thread local, but instead very different data structures; which is way, way beyond the scope of this question.
Nathan about 7 years

I should also point out that the performance hit on ThreadLocal is way less than the performance hit on creating and instantiating new objects, specifically a Set...
Đỗ Công Bằng over 4 years

What is the purpose of the second check if (set == null) { set = value;} please? We assign the value to set but does it present in the map?
Brett over 4 years

@CongBangDO putIfAbsent will return null (assigned to set) if the map now uses value. If the name key was already present, then putIfAbsent will return the existing retained value.