Why hashmap lookup is O(1) i.e. constant time?

data-structures hash hashmap hashtable big-o

43,355

Solution 1

Under the appropriate assumptions on the hash function being used, we can say that hash table lookups take expected O(1) time (assuming you're using a standard hashing scheme like linear probing or chained hashing). This means that on average, the amount of work that a hash table does to perform a lookup is at most some constant.

Intuitively, if you have a "good" hash function, you would expect that elements would be distributed more or less evenly throughout the hash table, meaning that the number of elements in each bucket would be close to the number of elements divided by the number of buckets. If the hash table implementation keeps this number low (say, by adding more buckets every time the ratio of elements to buckets exceeds some constant), then the expected amount of work that gets done ends up being some baseline amount of work to choose which bucket should be scanned, then doing "not too much" work looking at the elements there, because on expectation there will only be a constant number of elements in that bucket.

This doesn't mean that hash tables have guaranteed O(1) behavior. In fact, in the worst case, the hashing scheme will degenerate and all elements will end up in one bucket, making lookups take time Θ(n) in the worst case. This is why it's important to design good hash functions.

For more information, you might want to read an algorithms textbook to see the formal derivation of why hash tables support lookups so efficiently. This is usually included as part of a typical university course on algorithms and data structures, and there are many good resources online.

Fun fact: there are certain types of hash tables (cuckoo hash tables, dynamic perfect hash tables) where the worst case lookup time for an element is O(1). These hash tables work by guaranteeing that each element can only be in one of a few fixed positions, with insertions sometimes scrambling around elements to try to make everything fit.

Hope this helps!

Solution 2

The key is in this statement in the docs:

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.

and

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html

The internal bucket structure will actually be rebuilt if the load factor is exceeded, allowing for the amortized cost of get and put to be O(1).

Note that if the internal structure is rebuilt, that introduces a performance penalty that is likely to be O(N), so quite a few get and put may be required before the amortized cost approaches O(1) again. For that reason, plan the initial capacity and load factor appropriately, so that you neither waste space, nor trigger avoidable rebuilding of the internal structure.

Solution 3

Hashtables AREN'T O(1).

Via the pigeonhole principle, you cannot be better than O(log(n)) for lookup, because you need log(n) bits per item to uniquely identify n items.

Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using. However, big O notation doesn't care about that fact, and it is a (granted, absurdly common) misuse of the notation to call hashtables O(1).

Because while you could store a million, or a billion items in a hashtable and still get the same lookup time as a single item hashtable... You lose that ability if you're taking about a nonillion or googleplex items. The fact that you will never actually be using a nonillion or googleplex items doesn't matter for big O notation.

Practically speaking, hashtable performance can be a constant factor worse than array lookup performance. Which, yes, is also O(log(n)), because you CAN'T do better.

Basically, real world computers make every array lookup for arrays of size less than their chip bit size just as bad as their biggest theoretically usable array, and as hastables are clever tricks performed on arrays, that's why you seem to get O(1)

43,355

Author by

genonymous

Updated on July 09, 2022

Comments

genonymous almost 2 years

If we look from Java perspective then we can say that hashmap lookup takes constant time. But what about internal implementation? It still would have to search through particular bucket (for which key's hashcode matched) for different matching keys.Then why do we say that hashmap lookup takes constant time? Please explain.
genonymous about 11 years

Thanks for simple and clear answer. It answers my question neatly.
Sam over 8 years

You answered correctly for getting the value. But what about bucket location? We know for each key(suppose hash function is too good that generates different hash value for each key), a table will be created which locate bucket i.e linkedlist(till java 7). My question is if there are 1000K different key-value with 1000K buckets , how hashmap makes sure it will give o(1) for even lookup of bucket location? Hope i am clear about question. Thank you.
templatetypedef about 7 years

@Sam The buckets are typically stored as an array (either a raw one or some wrapper like ArrayList that supports efficient random access. That way, after you've computed the hash function, you can jump in time O(1) (independent of the number of buckets) to the right bucket.