Why is HashMap containsKey slower than get in Sun JDK? (sun-jdk-1.6.0.17)

10,914

Solution 1

I haven't tried to reproduce your results yet, but my first guess is that get simply returns the value (if any) that was found, while containsKey (which is what you tested, not contains) needs to test whether the key exists (with either a null or non-null value) and then return a boolean. Just a little more overhead involved.

Solution 2

Let's see the source code:

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}


public boolean containsKey(Object key) {
    return getEntry(key) != null;
}

final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

Maybe it's because of the extra method call or because of the repeated check for key != null?

Solution 3

Testing different sizes of hashmaps, if there is a bias in performance, its very small.

Running on Java 7 update 1 with a 4.6 GHz i7 2600K.

public class HashMapPerfMain {
    public static void main(String... args) {
        Integer[] keys = generateKeys(2 * 1000 * 1000);

        Map<Integer, Boolean> map = new HashMap<Integer, Boolean>();
        for (int j = 0; j < keys.length; j += 2)
            map.put(keys[j], true);

        for (int t = 0; t < 5; t++) {
            long start = System.nanoTime();
            int count = countContainsKey(map, keys);
            long time = System.nanoTime() - start;
            assert count == keys.length / 2;

            long start2 = System.nanoTime();
            int count2 = countGetIsNull(map, keys);
            long time2 = System.nanoTime() - start2;
            assert count2 == keys.length / 2;
            System.out.printf("Map.containsKey avg %.1f ns, ", (double) time / keys.length);
            System.out.printf("Map.get() == null avg %.1f ns, ", (double) time2 / keys.length);
            System.out.printf("Ratio was %.2f%n", (double) time2/ time);
        }
    }

    private static int countContainsKey(Map<Integer, Boolean> map, Integer[] keys) {
        int count = 0;
        for (Integer i : keys) {
            if (map.containsKey(i)) count++;
        }
        return count;
    }

    private static int countGetIsNull(Map<Integer, Boolean> map, Integer[] keys) {
        int count = 0;
        for (Integer i : keys) {
            if (map.get(i) == null) count++;
        }
        return count;
    }

    private static Integer[] generateKeys(int size) {
        Integer[] keys = new Integer[size];
        Random random = new Random();
        for (int i = 0; i < keys.length; i++)
            keys[i] = random.nextInt();
        return keys;
    }
}

prints for half million keys

Map.containsKey avg 27.1 ns, Map.get() == null avg 26.4 ns, Ratio was 0.97
Map.containsKey avg 19.6 ns, Map.get() == null avg 19.6 ns, Ratio was 1.00
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
Map.containsKey avg 18.2 ns, Map.get() == null avg 19.1 ns, Ratio was 1.05
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04

prints for one million keys

Map.containsKey avg 30.9 ns, Map.get() == null avg 30.9 ns, Ratio was 1.00
Map.containsKey avg 26.0 ns, Map.get() == null avg 25.5 ns, Ratio was 0.98
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 24.8 ns, Map.get() == null avg 25.0 ns, Ratio was 1.01

however for two million keys

Map.containsKey avg 36.5 ns, Map.get() == null avg 36.7 ns, Ratio was 1.00
Map.containsKey avg 34.3 ns, Map.get() == null avg 35.1 ns, Ratio was 1.02
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.1 ns, Ratio was 0.96
Map.containsKey avg 36.3 ns, Map.get() == null avg 35.1 ns, Ratio was 0.97
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.2 ns, Ratio was 0.96

for five million keys

Map.containsKey avg 40.1 ns, Map.get() == null avg 40.9 ns, Ratio was 1.02
Map.containsKey avg 38.6 ns, Map.get() == null avg 40.4 ns, Ratio was 1.04
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.97
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.98
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.8 ns, Ratio was 0.99

BTW: The time complexity for get() and containsKey is O(1) (on an idealized machine), but you can see that for a real machine, the cost increases with the size of the Map.

Share:
10,914
Stefan
Author by

Stefan

Single tier programmer for multi tier apps.

Updated on June 15, 2022

Comments

  • Stefan
    Stefan about 2 years

    Why is calling containsKey on a HashMap slower then get?

    Test: http://ideone.com/QsWXF (>15% difference, run on sun-jdk-1.6.0.17)

  • Jon Skeet
    Jon Skeet over 12 years
    No, it doesn't need to check whether the value is null - because in that case it should still return true. It should just be checking whether the entry exists.
  • Ted Hopp
    Ted Hopp over 12 years
    @Jon - Right you are. (That's what I meant, of course. ;-)) Amending my language.
  • Jon Skeet
    Jon Skeet over 12 years
    But I don't think I'd say it's more work - contains only needs to test for the existence of the entry, rather than fetching the value from the entry. get still has to test for the existence of the entry too...
  • Stefan
    Stefan over 12 years
    @Jon That was actually my thought that contains would be 'less' work since i only need a Yes/No answer.
  • Ted Hopp
    Ted Hopp over 12 years
    @Jon - Perhaps it's just that the way containsKey is implemented involves an extra method call (versus an extra field access in get). get does not do an explicit test that the key exists; it just falls out of the loop the same way that containsKey does. See the code posted by Mehrdad.
  • Stefan
    Stefan over 12 years
    Your countGetIsNull performs an extra operation. A similar operation is the difference between get and contains based on the source. So this little bit of extra work is skewing your results. I am not trying to emulate contains with get, all i was wondering is why contains would be slower at all, even considering JIT optimizations. The execution path for both should be almost identical and therefore benefit from the JIT in a similar way and still there is a 15% difference.
  • Vishy
    Vishy over 12 years
    The average ratio is 0.93 making a 7% difference. The extra check appears to match the difference. This is fairly small compared with the 8x difference in timings. (200 ns vs 25 vs) I suspect part of that is due to having an old version of the JVM. Java 6 update 17 was released 2009-11-04.
  • Stefan
    Stefan over 12 years
    I actually use an IBM JDK here at work and suprisingly Contains is faster the Get. (not by much tho)
  • Vishy
    Vishy over 12 years
    Can you try Oracle JDK 7 update 1 or Java 6 update 29 for comparison?