Chosing a suitable table size for a Hash
Solution 1
It depends on the load factor (the "percent full" point where the table will increase its size and re-distribute its elements). If you know you have exactly 1000 entries, and that number will never change, you can just set the load factor to 1.0 and the initial size to 1000 for maximum efficiency. If you weren't sure of the exact size, you could leave the load factor at its default of 0.75 and set your initial size to 1334 (expected size/LF) for really good performance, at a cost of extra memory.
You can use the following constructor to set the load factor:
Hashtable(int initialCapacity, float loadFactor)
Solution 2
You need to factor in the hash function as well.
one rule of thumb suggests make the table size about double, so that there is room to expand, and hopefully keep the number of collisions small.
Another rule of thumb is to assume that you are doing some sort of modulo related hashing, then round your table size up to the next largest prime number, and use that prime number as the modulo value.
What kind of things are you hashing? More detail should generate better advice.
Solution 3
There's some discussion of these factors in the documentation for Hashtable
Solution 4
Let it grow. With this size, the automatic handling is fine. Other than that, 2 x size + 1 is a simple formula. Prime numbers are also kind of good, but as soon as your data set reaches a certain size, the hash implementation might decide to rehash and grow the table.
Your keys are driving the effectiveness and are hopefully distinct enough.
Bottom line: Ask the size question when you have problems such as size or slow performance, other than that: Do not worry!
Related videos on Youtube
![kylex](https://i.stack.imgur.com/Hg4LF.jpg?s=256&g=1)
kylex
Updated on April 16, 2022Comments
-
kylex about 2 years
If I have a key set of 1000, what is a suitable size for my Hash table, and how is that determined?
-
user1066101 over 15 yearsAssuming that the hash function is well-behaved over the set of expected keys. A home-brewed hash function may not behave well in a minimally-sized table. For a home-brewed function, you'd have to run experiments.
-
Bill the Lizard over 15 yearsIf the hash function isn't well-behaved, colliding elements will be stored in the same bucket (in a LinkedList). The table being minimally-sized will have no effect whatsoever on performance.
-
Michael Rutherfurd over 15 yearsWorry about it if performance in this area becomes an issue. If you try to handle it up front you are more likely to insert a bug or simply have unnecessarily complex code which can cause a maintenance issue.
-
ReneS over 15 years2000 does not make a good size, because it is not prime. 2001 would be good, it is not prime, but at least not even. Will distribute the keys in the table much better.A good hashtable will take care of a good hash function but most of the time, the size is used.
-
fulmicoton over 15 yearsThis is an interesting question. Your statement is right if you use a hash key of type: H(s) = s[0] + b*s[1] + b^2s[2] + ... [N] I think today's industry standard is to use 2^k as size and better hash functions such as Jenkins's. Last time I checked the std was working with prime however.
-
ReneS over 15 yearsI agree. Have the problem first and look for a solution afterwards.
-
tomasyany almost 8 yearsThis is more a comment than a answer.