How does a Java HashMap handle different objects with the same hash code?
Solution 1
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals()
.
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode()
and equals()
methods of keys:
If two keys are the same (
equals()
returnstrue
when you compare them), theirhashCode()
method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use
equals()
to tell them apart.
Solution 2
Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap
as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
Solution 3
HashMap
is an array of Entry
objects.
Consider HashMap
as just an array of objects.
Have a look at what this Object
is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry
object represents a key-value pair. The field next
refers to another Entry
object if a bucket has more than one Entry
.
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next
field and so on. The last entry refers to null
.
When you create a HashMap
with the default constructor
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
Adding a new key-value pair
- Calculate hashcode for the key
- Calculate position
hash % (arrayLength-1)
where element should be placed (bucket number) - If you try to add a value with a key which has already been saved in
HashMap
, then value gets overwritten. - Otherwise element is added to the bucket.
If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next
field refers to the old element.
Deletion
- Calculate hashcode for the given key
- Calculate bucket number
hash % (arrayLength-1)
- Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct
Entry
. If a desired element is not found, returnnull
Solution 4
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
Solution 5
Here is a rough description of HashMap
's mechanism, for Java 8
version, (it might be slightly different from Java 6).
Data structures
-
Hash table
Hash value is calculated viahash()
on key, and it decide which bucket of the hashtable to use for a given key. -
Linked list (singly)
When count of elements in a bucket is small, a singly linked list is used. -
Red-Black tree
When count of elements in a bucket is large, a red-black tree is used.
Classes (internal)
-
Map.Entry
Represent a single entity in map, the key/value entity. -
HashMap.Node
Linked list version of node.It could represent:
- A hash bucket.
Because it has a hash property. - A node in singly linked list, (thus also head of linkedlist).
- A hash bucket.
-
HashMap.TreeNode
Tree version of node.
Fields (internal)
-
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference. -
Set<Map.Entry> entrySet
Set of entities. -
int size
Number of entities. -
float loadFactor
Indicate how full the hash table is allowed, before resizing. -
int threshold
The next size at which to resize.
Formula:threshold = capacity * loadFactor
Methods (internal)
-
int hash(key)
Calculate hash by key. -
How to map hash to bucket?
Use following logic:static int hashToBucket(int tableSize, int hash) { return (tableSize - 1) & hash; }
About capacity
In hash table, capacity means the bucket count, it could be get from table.length
.
Also could be calculated via threshold
and loadFactor
, thus no need to be defined as a class field.
Could get the effective capacity via: capacity()
Operations
- Find entity by key.
First find the bucket by hash value, then loop linked list or search sorted tree. - Add entity with key.
First find the bucket according to hash value of key.
Then try find the value:- If found, replace the value.
- Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
- Resize
Whenthreshold
reached, will double hashtable's capacity(table.length
), then perform a re-hash on all elements to rebuild the table.
This could be an expensive operation.
Performance
- get & put
Time complexity isO(1)
, because:- Bucket is accessed via array index, thus
O(1)
. - Linked list in each bucket is of small length, thus could view as
O(1)
. - Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as
O(1)
, notO(log N)
.
- Bucket is accessed via array index, thus
Related videos on Youtube
akshay
Updated on November 02, 2021Comments
-
akshay over 2 years
As per my understanding I think:
- It is perfectly legal for two objects to have the same hashcode.
- If two objects are equal (using the equals() method) then they have the same hashcode.
- If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question: The
HashMap
internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can theHashMap
track which key it uses?Can someone explain how the
HashMap
internally uses the hashcode of the object?-
Joachim Sauer over 10 yearsFor the record: #1 and #2 are correct, #3 is wrong: Two objects that are not equal may have the same hash code.
-
Delfic almost 8 years#1 and #3 are contradictory even
-
Joachim Sauer over 7 yearsIndeed, if #2 is not followed, then the equals() implementation (or arguably the hashCode()) is incorrect.
-
Geek almost 12 yearshow did u arrive at 2^32 possible objects ?
-
Geek almost 12 yearsyou wrote "and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket)." Can you explain it is going to look in the same bucket say those two equals objects are t1 and t2 and they are equal and t1 and t2 have hashcodes h1 and h2 respectively .So t1.equals(t2)=true and h1!=h2 So when the hashmap would look for t1 , it will look in the bucket h1 and for t2 in the bucket t2 ?
-
Jesper over 11 yearsIf two keys are equal but their
hashCode()
method returns different hash codes, then theequals()
andhashCode()
methods of the key class violate the contract and you'll get strange results when using those keys in aHashMap
. -
Ankit Sharma almost 10 yearsEach bucket can have multiple Key Values pairs, which are uses linked list internally. But my confusion is - what is bucket here? What data structure it uses internally? Is there any connection between buckets?
-
Jesper almost 10 years@AnkitSharma If you want to really know all the details, lookup the source code of
HashMap
, which you can find in the filesrc.zip
in your JDK installation directory. -
Ankit Sharma almost 10 yearsThanks @Jesper I got the solution - Data structure to store Entry objects is an array named table of type Entry. A particular index location in array is referred as bucket, because it can hold the first element of a LinkedList of Entry objects.
-
Ankit Sharma almost 10 yearsRefer this link for more details howtodoinjava.com/2012/10/09/how-hashmap-works-in-java
-
Narendra N over 9 yearsI found the answer provided by Jasper better, I felt the blog is more towards handling interview, than understanding the concept
-
Abhijit Gaikwad over 9 years@NarendraN I agree with you.
-
LittleLittleQ over 8 yearsBut for efficiency, you'd better keep your different object having different hashcode
-
Jesper over 8 years@AnnabellChan Yes, indeed. A
HashMap
would still work if all keys had the same hashcode, but it would be very inefficient, because everything would go into one bucket. Finding something in the map would then be just as inefficient as finding it in a list (O(n)). -
Admin about 8 years@Jesper Really great explanation, helped me understand it very well. I know this might be a little late but when I was reading your answer I had a quick question. In the buckets of the HashMap which hold the key/value pairs, is there any relationship amongst the keys in the same bucket? Like are the keys within a specific range or something like that? I read online there is some relation but not too sure how it actually determines this relationship between the keys in a particular bucket...
-
Jesper about 8 years@1290 The only relation between the keys in the same bucket is that they have the same hash code.
-
Admin about 8 years@Jesper Oh ok, Thank you for clarifying.
-
Navrattan Yadav almost 8 yearswhy it use bucket size power of 2 ?
-
ACV almost 8 yearsI think this is wrong: "It stores keys in an array and values in a LinkedList ."
-
ACV almost 8 yearseach element in the list for each bucket contains the key and the value as well as the reference to the next node.
-
weston over 7 yearsThis is wrong
hash % (arrayLength-1)
it would behash % arrayLength
. But it actually ishash & (arrayLength-1)
. That is, because it uses powers of two (2^n
) for array length, takingn
least significant bits. -
weston over 7 years@navy It's a good growing strategy to double up. But other bits of the code rely on it, check out
indexFor
where it works out the bucket to use. -
Mudit bhaintwal over 7 yearsI think it's not an array of Entity objects rather an array of LinkedList/Tree. And each tree internally have Entity objects.
-
Jitendra over 7 yearsCan you please give a example How has time complexity O(1)
-
Eric over 7 years@jsroyal This might explain the complexity more clearly: en.wikipedia.org/wiki/Hash_table. But in short: finding the target bucket is O(1), because you find it by index in an array; then within a bucket, the amount is elements is small & on average a constant number despite the total number of elements in the whole hashtable, so searching for the target element within a bucket is also O(1); thus, O(1) + O(1) = O(1).
-
kiedysktos about 7 years@Jesper thanks a lot, this is one of the best explanations on this topic I've seen
-
roottraveller almost 7 years@shevchyk why do we store key and hash? what are their use ? Are not we wasting memory here?
-
Sudhanshu Gaur almost 7 yearsawesome answer (y)
-
overexchange over 6 yearshashset internally uses hashmap. does addition and deletion rules of hashmap, holds good for hashset?
-
xeruf over 6 yearsI'm late, but for those still wondering: A hashcode in Java is an int, and an int has 2^32 possible values
-
Eugene almost 6 years@weston not only that, hashCode is an
int
which of course can be negative, doing modulo on a negative number will give you a negative number -
Sharan Rai over 3 yearsThis answer need to be edited, once after the hashcode is calculated, bucket number in which "key, value,hashcode info" will be stored need to be calculated, because its not a good idea to have as many numbers of buckets in as per hashcode number. for Example 235%16(initial capacity in java) 11 is the bucket number in above case.