Rehashing process in hashmap or hashtable

59,837

Solution 1

The maximum threshold in the question is called the load factor.

It is advisable to have a load factor of around 0.75. Load factor is defined as (m/n) where n is the total size of the hash table and m is the preferred number of entries which can be inserted before a increment in size of the underlying data structure is required.

Rehashing can be done in two cases:

  1. When the present m'/n ratio increases beyond the load factor

  2. M'/n ratio falls to a very low value say 0.1

In both the cases m' is the current number of entries. Also, both the cases demand the shifting of the present entries into a bigger or a smaller hash table.

In the question's context rehashing is the process of applying a hash function to the entries to move them to another hash table. It is possible to use the hash function which was used earlier or use a new function altogether.

Please note: Rehashing is also done when a collision occurs. (It's a way of handling collisions too.)

To add some more context and a detailed discussion please visit my blog Hashing Basics

Solution 2

Rehashing of a hash map is done when the number of elements in the map reaches the maximum threshold value.

Usually the load factor value is 0.75 and the default initial capacity value is 16. Once the number of elements reaches or crosses 0.75 times the capacity, then rehashing of map takes place. In this case when the number of elements are 12, then rehashing occurs. (0.75 * 16 = 12)

When rehashing occurs a new hash function or even the same hash function could be used but the buckets at which the values are present could change. Basically when rehashing occurs the number of buckets are approximately doubled and hence the new index at which the value has to be put changes.

While rehashing, the linked list for each bucket gets reversed in order. This happens because HashMap doesn't append the new element at the tail instead it appends the new element at the head. So when rehashing occurs, it reads each element and inserts it in the new bucket at the head and then keeps on adding next elements from the old map at the head of the new map resulting in reversal of linked list.

If there are multiple threads handling the same hash map it could result in infinite loop.

Detailed explanation stating how infinite loop occurs in the above case can be found here : http://mailinator.blogspot.hu/2009/06/beautiful-race-condition.html

If the elements inserted in the map has to be sorted wrt the keys then TreeMap can be used. But HashMap would be more efficient if order of keys doesn't matter.

Solution 3

Hashing – Rehashing and Race condition

Basically while creating a hash map, collection assigns it a default capacity (of 2^4 i.e 16.). Later stage when elements are added in the map and after a certain stage when you come close to your initial defined capacity there is a requirement of ReHashing to retain the performance.

There is LoadFactor defined for the collection (said to be good as .75) and this specifies the good index for time and space.

  • LARGER load factor => lower space consumption but higher lookups
  • SMALLER Load factor => Larger space consumption compared to the required no of elements.

Java specification suggests that Good load factor value is .75

Hence Suppose you have a maximum requirement to store 10 elements in hash then considering the Good Loadfactor .75 = Rehashing would occur after adding 7 elements in the collection. In case if your requirement, in this case, would not accede to 7 then Rehashing would never occur.

If there are really large no of elements going to be stored in the hashmap then it is always good to create HashMap with sufficient capacity; this is more efficient than letting it to perform automatic rehashing.

RACE condition : While doing the rehashing internal elements which are stored in a linked list for given bucket. They get reverse in the order. Suppose there are two threads encounter the race condition in same time then there are chances of second therad can go in infinite loop while traversal since the order has been changed.

Share:
59,837
a Learner
Author by

a Learner

Updated on July 05, 2022

Comments

  • a Learner
    a Learner almost 2 years

    How is the rehashing process done in a hashmap or hashtable when the size exceeds the maxthreshold value?

    Are all pairs just copied to a new array of buckets?

    EDIT:

    What happen to the elements in the same bucket (in linked list) after rehashing? I mean will they remain in same bucket after rehashing?

  • skrrgwasme
    skrrgwasme almost 10 years
    Before you post answers, please check how your answer is going to look in the preview window. You are missing a couple of line breaks that you can fix by adding two spaces to the end of the previous lines.
  • Aravind
    Aravind about 8 years
    "possible to use the same hash function": How then would the new buckets be used when the number of buckets increases?
  • Tim Biegeleisen
    Tim Biegeleisen about 8 years
    I believe the new buckets (and rehashed table) would be used no differently than the original hash table, meaning that after rehashing it will be business as usual with a larger number of buckets.
  • dharam
    dharam almost 7 years
    You got the essence.. In advanced implementations of Maps, the rehashing is a complicated process. Its done over a span of time, where multiple threads are involved to keep the amortized cost per thread less. So, it will take a reasonable amount of time to release the space for original array. till that time, both arrays exists and are fully functional :)
  • sid_09
    sid_09 almost 7 years
    Hashmap default size is 16
  • Gusev Slava
    Gusev Slava almost 6 years
    @dharam "Please note: Rehashing is also done when a collision occurs." What is the threshold in this case? How to count the maximum allowed number of elements in bucket?
  • chaitra.kear
    chaitra.kear over 3 years
    My only question is, if the rehashing would occur for every 13th element, it would be sub optimal right. lets say, num_buckets = 16, total_no_entries =
  • Melwin
    Melwin over 3 years
    Initially, the number of buckets is 16 and it would rehash when the 13th element is encountered and the number of buckets would be doubled. This would make the total number of buckets 32 and the next rehashing occurs while exceeding 0.75 * 32, i.e. 25th element. Then the number of buckets is again doubled to 64 and the next rehashing shall happen while exceeding 0.75 * 64, i.e. 49th element. Similarly, it goes on and you can see it doesn't rehash after every 13th element but keeps on doubling. If n is the point at which rehashing occurred, then the next rehashing would happen at 2*n - 1.