Chosing a suitable table size for a Hash

19,622

Solution 1

It depends on the load factor (the "percent full" point where the table will increase its size and re-distribute its elements). If you know you have exactly 1000 entries, and that number will never change, you can just set the load factor to 1.0 and the initial size to 1000 for maximum efficiency. If you weren't sure of the exact size, you could leave the load factor at its default of 0.75 and set your initial size to 1334 (expected size/LF) for really good performance, at a cost of extra memory.

You can use the following constructor to set the load factor:

Hashtable(int initialCapacity, float loadFactor) 

Solution 2

You need to factor in the hash function as well.

one rule of thumb suggests make the table size about double, so that there is room to expand, and hopefully keep the number of collisions small.

Another rule of thumb is to assume that you are doing some sort of modulo related hashing, then round your table size up to the next largest prime number, and use that prime number as the modulo value.

What kind of things are you hashing? More detail should generate better advice.

Solution 3

There's some discussion of these factors in the documentation for Hashtable

Solution 4

Let it grow. With this size, the automatic handling is fine. Other than that, 2 x size + 1 is a simple formula. Prime numbers are also kind of good, but as soon as your data set reaches a certain size, the hash implementation might decide to rehash and grow the table.

Your keys are driving the effectiveness and are hopefully distinct enough.

Bottom line: Ask the size question when you have problems such as size or slow performance, other than that: Do not worry!

Share:
19,622

Related videos on Youtube

kylex
Author by

kylex

Updated on April 16, 2022

Comments

  • kylex
    kylex about 2 years

    If I have a key set of 1000, what is a suitable size for my Hash table, and how is that determined?

  • user1066101
    user1066101 over 15 years
    Assuming that the hash function is well-behaved over the set of expected keys. A home-brewed hash function may not behave well in a minimally-sized table. For a home-brewed function, you'd have to run experiments.
  • Bill the Lizard
    Bill the Lizard over 15 years
    If the hash function isn't well-behaved, colliding elements will be stored in the same bucket (in a LinkedList). The table being minimally-sized will have no effect whatsoever on performance.
  • Michael Rutherfurd
    Michael Rutherfurd over 15 years
    Worry about it if performance in this area becomes an issue. If you try to handle it up front you are more likely to insert a bug or simply have unnecessarily complex code which can cause a maintenance issue.
  • ReneS
    ReneS over 15 years
    2000 does not make a good size, because it is not prime. 2001 would be good, it is not prime, but at least not even. Will distribute the keys in the table much better.A good hashtable will take care of a good hash function but most of the time, the size is used.
  • fulmicoton
    fulmicoton over 15 years
    This is an interesting question. Your statement is right if you use a hash key of type: H(s) = s[0] + b*s[1] + b^2s[2] + ... [N] I think today's industry standard is to use 2^k as size and better hash functions such as Jenkins's. Last time I checked the std was working with prime however.
  • ReneS
    ReneS over 15 years
    I agree. Have the problem first and look for a solution afterwards.
  • tomasyany
    tomasyany almost 8 years
    This is more a comment than a answer.