Java ConcurrentHashMap is better than HashMap performance wise?

16,101

Solution 1

Doug Lea is extremely good at these things, so I won't be surprised if at one time his ConcurrentHashMap performs better than Joshua Bloch's HashMap. However as of Java 7, the first @author of HashMap has become Doug Lea too. Obviously now there's no reason HashMap would be any slower than its concurrent cousin.

Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually ConcurrentHashMap is within 3% of HashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don't fit in cache, both will be slow. In real applications, a map doesn't have to be big to compete with others for residing in cache. If a map is used often, it's cached; if not, it's not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)

public static void main(String[] args)
{
    for(int i = 0; i<100; i++)
    {
        System.out.println();

        int entries = i*100*1000;
        long t0=test( entries, new FakeMap() );
        long t1=test( entries, new HashMap() );
        long t2=test( entries, new ConcurrentHashMap() );

        long diff = (t2-t1)*100/(t1-t0);
        System.out.printf("entries=%,d time diff= %d%% %n", entries, diff);
    }
}


static long test(int ENTRIES, Map map)
{
    long SEED = 0;
    Random random = new Random(SEED);

    int RW_RATIO = 10;

    long t0 = System.nanoTime();

    for(int i=0; i<ENTRIES; i++)
        map.put( random.nextInt(), random.nextInt() );

    for(int i=0; i<RW_RATIO; i++)
    {
        random.setSeed(SEED);
        for(int j=0; j<ENTRIES; j++)
        {
            map.get( random.nextInt() );
            random.nextInt();
        }
    }
    long t = System.nanoTime()-t0;
    System.out.printf("%,d ns %s %n", t, map.getClass());
    return t;
}


static class FakeMap implements Map
{
    public Object get(Object key)
    {
        return null;  
    }
    public Object put(Object key, Object value)
    {
        return null;  
    }
    // etc. etc.
}

Solution 2

If you are accessing the HashMap with only a single thread HashMap is fastest (it does not do any synchronization), if you are accessing it from multiple threads ConcurrentHashMap is faster than doing the synchronization coarse-grained by hand. See here for a little comparison:

http://www.codercorp.com/blog/java/why-concurrenthashmap-is-better-than-hashtable-and-just-as-good-hashmap.html

Solution 3

The reason a HashMap could be slower is because it has to detect ConcurrentModification to know when to throw an exception. ConcurrentHashMap does not have to check modCount to know when to throw (but it does use it for size() and isEmpty()). Acquiring a lock is very fast, especially so in single-threaded situations when you already hold the lock, but checking modCount is two reads and a jump-if-not-equal that HashMap must pay to throw CoModException.

I recommend reading the source of your collections classes, so you know how much work they are doing when you make a method call. In situations when you have a completely private map for dictionary get/put only, you can often used a stripped down HashMap without any modCount or even size tracking for added performance boost.

Solution 4

This is the kind of rubbery statement that is hard to prove one way or the other. How do you measure something in "nearly all situations"?

A ConcurrentHashMap is likely to be better than a synchronized HashMap. The more contention there is, the more significant the difference will be. On the other hand an unsynchronized HashMap is likely to be faster than a ConcurrentHashMap, because of the overhead of unnecessary locking in the latter case.


I'd also like to see the context for that statement, and what evidence the book's author puts forward to support it. And the evidence for the unstated assumption that "nearly all" use-cases for hash maps involve synchronization.

Share:
16,101

Related videos on Youtube

Alvin
Author by

Alvin

Programming is fun :D

Updated on June 01, 2022

Comments

  • Alvin
    Alvin about 2 years

    I was just reading the book Clean Code and came across this statement:

    When Java was young Doug Lea wrote the seminal book[8] Concurrent Programming in Java. Along with the book he developed several thread-safe collection, which later became part of the JDK in the java.util.concurrent package. The collections in that package are safe for multithreaded situations and they perform well. In fact, the ConcurrentHashMap implementation performs better than HashMap in nearly all situations. It also allows for simultaneous concurrent reads and writes, and it has methods supporting common composite operations that are otherwise not thread safe. If Java 5 is the deployment environment, start with ConcurrentHashMap

    Note that in the above quote I used "[n]", where n is some number, to indicate the places where the author provided references, and as you can see he did not provide any reference for the bold part.

    Not that I don't believe this statement, but I would love to know the supporting evidences of this statement. So, does anyone know any resources that shows the performance statistics for both ConcurrentHashMap and HashMap? Or can anyone explain to me why ConcurrentHashMap is faster than HashMap?

    I probably will look into ConcurrentHashMap's implementation at work when I'm taking a break, but for now I would like to hear the answers from fellow SOers.

    • Jeff Foster
      Jeff Foster almost 13 years
      Without seeing the context, it's difficult to tell what the author meant, but perhaps the author means something like sticking "synchronized" around all HashMap methods is a bad idea, and using something explicitly designed to support multiple threads, like ConcurrentHashMap, is a much better idea.
    • Matthew Gilliard
      Matthew Gilliard almost 13 years
      I can't find this quote in Clean Code, but it's a big book...
    • Alvin
      Alvin almost 13 years
      @Matthew It's on page 183, top of the page. It's chapter 13-Concurrency.
    • Alvin
      Alvin almost 13 years
      @Jeff Foster Not really, it simply says start using ConcurrentHashMap instead of HashMap. I updated my question with the exact paragraph.
  • Matthew Gilliard
    Matthew Gilliard almost 13 years
    If you use Collections.synchronizedMap( HashMap ) you will get a map which is locked as a whole entity. ConcurrentHashMap does allow some concurrent access (reads and writes), the approach is to stripe the locks, so as to partition the hashtable into independently-locked domains.
  • Alvin
    Alvin almost 13 years
    But the author seems to suggest to use ConcurrentHashMap even for single thread situation though.
  • Stephen C
    Stephen C almost 13 years
    @Alvin - my reading of the context is that the quoted statement is actually referring to multi-threaded applications. The preceding sentence is a giveaway. And in that context, I wouldn't disagree with it.
  • Bernd Elkemann
    Bernd Elkemann almost 13 years
    In your first paragraph you are trying to prove that things must be fast because Doug Lea manages the project?? In your code: Where's the concurrency??
  • irreputable
    irreputable almost 13 years
    the story is about ConcurrentHashMap was faster than HashMap in single thread situation, so that's what I'm testing.
  • anuragw
    anuragw over 11 years
    @StephenC The preceding sentence says safe for multithreaded situations and they perform well while the contentious one says nearly all situations - so i doubt the scope of the context is just multi-threaded applications. I guess I still agree with your initial comments - where's the evidence?
  • Ajax
    Ajax about 11 years
    The reason a HashMap would be slower is because it has to detect ConcurrentModification to know when to throw an exception. ConcurrentHashMap does not have to track modifications and check modCount. Read the source of your collections classes.