ThreadLocal HashMap vs ConcurrentHashMap for thread-safe unbound caches

java performance caching thread-local concurrenthashmap

15,447

Solution 1

use ThreadLocal as cache is a not good practice

In most containers, threads are reused via thread pools and thus are never gc. this would lead something wired

use ConcurrentHashMap you have to manage it in order to prevent mem leak

if you insist, i suggest using week or soft ref and evict after rich maxsize

if you are finding a in mem cache solution ( do not reinventing the wheel ) try guava cache http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html

Solution 2

this computation is very expensive

I assume this is the reason you created the cache and this should be your primary concern.

While the speed of the solutions might be slightly different << 100 ns, I suspect it is more important that you be able to share results between threads. i.e. ConcurrentHashMap is likely to be the best for your application is it is likely to save you more CPU time in the long run.

In short, the speed of you solution is likely to be tiny compared to the cost of computing the same thing multiple times (for multiple threads)

Solution 3

Note that your ConcurrentHashMap implementation is not thread safe and could lead to one item being computed twice. It is actually quite complicated to get it right if you store the results directly without using explicit locking, which you certainly want to avoid if performance is a concern.

It is worth noting that ConcurrentHashMap is highly scalable and works well under high contention. I don't know if ThreadLocal would perform better.

Apart from using a library, you could take some inspiration from Java Concurrency in Practice Listing 5.19. The idea is to save a Future<V> in your map instead of a V. That helps a lot in making the whole method thread safe while staying efficient (lock-free). I paste the implementation below for reference but the chapter is worth reading to understand that every detail counts.

public interface Computable<K, V> {

    V compute(K arg) throws InterruptedException;
}

public class Memoizer<K, V> implements Computable<K, V> {

    private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<K, Future<V>>();
    private final Computable<K, V> c;

    public Memoizer(Computable<K, V> c) {
        this.c = c;
    }

    public V compute(final K arg) throws InterruptedException {
        while (true) {
            Future<V> f = cache.get(arg);
            if (f == null) {
                Callable<V> eval = new Callable<V>() {
                    public V call() throws InterruptedException {
                        return c.compute(arg);
                    }
                };
                FutureTask<V> ft = new FutureTask<V>(eval);
                f = cache.putIfAbsent(arg, ft);
                if (f == null) {
                    f = ft;
                    ft.run();
                }
            }
            try {
                return f.get();
            } catch (CancellationException e) {
                cache.remove(arg, f);
            } catch (ExecutionException e) {
                throw new RuntimeException(e.getCause());
            }
        }
    }
}

Solution 4

Given that it's relatively easy to implement both of these, I would suggest you try them both and test at steady state load to see which one performs the best for your application.

My guess is that the the ConcurrentHashMap will be a little faster since it does not have to make native calls to Thread.currentThread() like a ThreadLocal does. However, this may depend on the objects you are storing and how efficient their hash coding is.

I may also be worthwhile trying to tune the concurrent map's concurrencyLevel to the number of threads you need. It defaults to 16.

Solution 5

The performance question is irrelevant, as the solutions are not equivalent.

The ThreadLocal hash map isn't shared between threads, so the question of thread safety doesn't even arise, but it also doesn't meet your specification, which doesn't say anything about each thread having its own cache.

The requirement for thread safety implies that a single cache is shared among all threads, which rules out ThreadLocal completely.

View more solutions

15,447

Author by

Maian

Updated on July 24, 2022

Comments

Maian over 1 year
I'm creating a memoization cache with the following characteristics:
- a cache miss will result in computing and storing an entry
  - this computation is very expensive
  - this computation is idempotent
- unbounded (entries never removed) since:
  - the inputs would result in at most 500 entries
  - each stored entry is very small
  - cache is relatively shorted-lived (typically less than an hour)
  - overall, memory usage isn't an issue
- there will be thousands of reads - over the cache's lifetime, I expect 99.9%+ cache hits
- must be thread-safe
What would have superior performance, or under what conditions would one solution be favored over the other?

ThreadLocal HashMap:
```
class MyCache {
    private static class LocalMyCache {
        final Map<K,V> map = new HashMap<K,V>();

        V get(K key) {
            V val = map.get(key);
            if (val == null) {
                val = computeVal(key);
                map.put(key, val);
            }
            return val;
        }
    }

    private final ThreadLocal<LocalMyCache> localCaches = new ThreadLocal<LocalMyCache>() {
        protected LocalMyCache initialValue() {
            return new LocalMyCache();
        }
    };

    public V get(K key) {
        return localCaches.get().get(key);
    }
}
```
ConcurrentHashMap:
```
class MyCache {
    private final ConcurrentHashMap<K,V> map = new ConcurrentHashMap<K,V>();

    public V get(K key) {
        V val = map.get(key);
        if (val == null) {
            val = computeVal(key);
            map.put(key, val);
        }
        return val;
    }
}
```
I figure the ThreadLocal solution would initially be slower if there a lot of threads because of all the cache misses per thread, but over thousands of reads, the amortized cost would be lower than the ConcurrentHashMap solution. Is my intuition correct?

Or is there an even better solution?
irreputable over 11 years

both Thread.currentThread() and ThreadLocal are very fast.
user207421 over 11 years

I cannot understand this answer. Please restate it in standard English. You seem to be recommending two different and mutually incompatible answers at the same time.
Maian over 11 years

It is thread-safe in the sense that the computations are idempotent. My solution only has a suboptimal cache miss performance cost. I could use your solution, but there seems to be relatively large overhead on cache hits, when the vast majority of my reads (like 99.9%+) will be cache hits.
Maian over 11 years

The ThreadLocal is responsible for the thread-safety of MyCache.get(). Thread-safety doesn't imply usages of shared objects. See en.wikipedia.org/wiki/Thread_safety
Maian over 11 years

As mentioned in the description, the cache's are short-lived and the input guarantees a limited number of keys. So memory during the lifetime of the cache is not an issue. Furthermore, I'm pretty sure that ThreadLocals can get cleaned up once the MyCache object is eligible for GC, so there shouldn't be a memory leak.
Maian over 11 years

With that said, I can give the Google CacheBuilder a try. It seems to have a lot more functionality than I need though - for an unbounded cache, I don't need the overhead of expiration policies.
assylias over 11 years

@Maian Agreed: it depends on your usage pattern but I don't think the alternative has so much overhead for cache hits (possibly none once JIT'ed).
Maian over 11 years

This is solution is somewhat similar to the assylias's FutureTask solution, in that it uses volatiles (instead of locks) and both are designed to eliminate redundant computations. Your solution is also a bit odd in that it uses both ThreadLocal AND a shared ConcurrentHashMap - in that case, I wonder if just using a ConcurrentHashMap (in a way that avoids redundant computations) would be better. I also don't like the usage of a static ConcurrentHashMap - I want the cache to be short-lived and don't want memory can creep up over days (and I don't want to have to introduce expiration policies).
Maian over 11 years

It ultimately depends on the ratio of cumulative computation time to the cumulative read times. I think I'm just going to test each solution out and profile them.
Maian over 11 years

Indeed, I think I'll just have to test and profile each solution (including the suggested completely memoized ConcurrentHashMap solution).
farmer1992 over 11 years

@Maian guava cacah provide a param 'concurrencyLevel' you can adjust it to improve performance. something like changing number of ConcurrentHashMap partition to avoid conflict locking
farmer1992 over 11 years

@EJP edited. i just recommend using a good cache framework instead of creating one. ThreadLocal and ConcurrentHashMap are both unrecommended
Vishy over 11 years

If you have a high read to write ratio you get almost the same performance as local Maps. It also has 16 partitions so that if you have multiple threads try to write at the same time they can do so without contention if they fall into different partitions.
Maian about 11 years

So it turns out that using ThreadLocals for caches is a bad idea if you can't fully control the threads. It's especially bad if you're instantiating a ThreadLocal per cache. A ThreadLocal has a weird lifecycle - contents for a Thread only get garbage-collected if explicitly removed or if the Thread dies. For this memoization use case, a ConcurrentHashMap is sufficient. Guava's CacheBuilder would be better, but it's not because of eviction policies (unneeded in this case); it's because it supports the compute-on-the-fly functionality with LoadingCache.