Why is HashMap containsKey slower than get in Sun JDK? (sun-jdk-1.6.0.17)
Solution 1
I haven't tried to reproduce your results yet, but my first guess is that get
simply returns the value (if any) that was found, while containsKey
(which is what you tested, not contains
) needs to test whether the key exists (with either a null or non-null value) and then return a boolean. Just a little more overhead involved.
Solution 2
Let's see the source code:
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
public boolean containsKey(Object key) {
return getEntry(key) != null;
}
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
Maybe it's because of the extra method call or because of the repeated check for key != null
?
Solution 3
Testing different sizes of hashmaps, if there is a bias in performance, its very small.
Running on Java 7 update 1 with a 4.6 GHz i7 2600K.
public class HashMapPerfMain {
public static void main(String... args) {
Integer[] keys = generateKeys(2 * 1000 * 1000);
Map<Integer, Boolean> map = new HashMap<Integer, Boolean>();
for (int j = 0; j < keys.length; j += 2)
map.put(keys[j], true);
for (int t = 0; t < 5; t++) {
long start = System.nanoTime();
int count = countContainsKey(map, keys);
long time = System.nanoTime() - start;
assert count == keys.length / 2;
long start2 = System.nanoTime();
int count2 = countGetIsNull(map, keys);
long time2 = System.nanoTime() - start2;
assert count2 == keys.length / 2;
System.out.printf("Map.containsKey avg %.1f ns, ", (double) time / keys.length);
System.out.printf("Map.get() == null avg %.1f ns, ", (double) time2 / keys.length);
System.out.printf("Ratio was %.2f%n", (double) time2/ time);
}
}
private static int countContainsKey(Map<Integer, Boolean> map, Integer[] keys) {
int count = 0;
for (Integer i : keys) {
if (map.containsKey(i)) count++;
}
return count;
}
private static int countGetIsNull(Map<Integer, Boolean> map, Integer[] keys) {
int count = 0;
for (Integer i : keys) {
if (map.get(i) == null) count++;
}
return count;
}
private static Integer[] generateKeys(int size) {
Integer[] keys = new Integer[size];
Random random = new Random();
for (int i = 0; i < keys.length; i++)
keys[i] = random.nextInt();
return keys;
}
}
prints for half million keys
Map.containsKey avg 27.1 ns, Map.get() == null avg 26.4 ns, Ratio was 0.97
Map.containsKey avg 19.6 ns, Map.get() == null avg 19.6 ns, Ratio was 1.00
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
Map.containsKey avg 18.2 ns, Map.get() == null avg 19.1 ns, Ratio was 1.05
Map.containsKey avg 18.3 ns, Map.get() == null avg 19.0 ns, Ratio was 1.04
prints for one million keys
Map.containsKey avg 30.9 ns, Map.get() == null avg 30.9 ns, Ratio was 1.00
Map.containsKey avg 26.0 ns, Map.get() == null avg 25.5 ns, Ratio was 0.98
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 25.0 ns, Map.get() == null avg 24.9 ns, Ratio was 1.00
Map.containsKey avg 24.8 ns, Map.get() == null avg 25.0 ns, Ratio was 1.01
however for two million keys
Map.containsKey avg 36.5 ns, Map.get() == null avg 36.7 ns, Ratio was 1.00
Map.containsKey avg 34.3 ns, Map.get() == null avg 35.1 ns, Ratio was 1.02
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.1 ns, Ratio was 0.96
Map.containsKey avg 36.3 ns, Map.get() == null avg 35.1 ns, Ratio was 0.97
Map.containsKey avg 36.7 ns, Map.get() == null avg 35.2 ns, Ratio was 0.96
for five million keys
Map.containsKey avg 40.1 ns, Map.get() == null avg 40.9 ns, Ratio was 1.02
Map.containsKey avg 38.6 ns, Map.get() == null avg 40.4 ns, Ratio was 1.04
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.97
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.3 ns, Ratio was 0.98
Map.containsKey avg 39.3 ns, Map.get() == null avg 38.8 ns, Ratio was 0.99
BTW: The time complexity for get() and containsKey is O(1) (on an idealized machine), but you can see that for a real machine, the cost increases with the size of the Map.
Comments
-
Stefan about 2 years
Why is calling
containsKey
on a HashMap slower thenget
?Test: http://ideone.com/QsWXF (>15% difference, run on sun-jdk-1.6.0.17)
-
Jon Skeet over 12 yearsNo, it doesn't need to check whether the value is null - because in that case it should still return true. It should just be checking whether the entry exists.
-
Ted Hopp over 12 years@Jon - Right you are. (That's what I meant, of course. ;-)) Amending my language.
-
Jon Skeet over 12 yearsBut I don't think I'd say it's more work -
contains
only needs to test for the existence of the entry, rather than fetching the value from the entry.get
still has to test for the existence of the entry too... -
Stefan over 12 years@Jon That was actually my thought that contains would be 'less' work since i only need a Yes/No answer.
-
Ted Hopp over 12 years@Jon - Perhaps it's just that the way
containsKey
is implemented involves an extra method call (versus an extra field access inget
).get
does not do an explicit test that the key exists; it just falls out of the loop the same way thatcontainsKey
does. See the code posted by Mehrdad. -
Stefan over 12 yearsYour countGetIsNull performs an extra operation. A similar operation is the difference between get and contains based on the source. So this little bit of extra work is skewing your results. I am not trying to emulate contains with get, all i was wondering is why contains would be slower at all, even considering JIT optimizations. The execution path for both should be almost identical and therefore benefit from the JIT in a similar way and still there is a 15% difference.
-
Vishy over 12 yearsThe average ratio is 0.93 making a 7% difference. The extra check appears to match the difference. This is fairly small compared with the 8x difference in timings. (200 ns vs 25 vs) I suspect part of that is due to having an old version of the JVM. Java 6 update 17 was released 2009-11-04.
-
Stefan over 12 yearsI actually use an IBM JDK here at work and suprisingly Contains is faster the Get. (not by much tho)
-
Vishy over 12 yearsCan you try Oracle JDK 7 update 1 or Java 6 update 29 for comparison?