When to use hash tables?

10,742

Solution 1

What are the cases when using hash table can improve performance, and when it does not?

If you have reason to care, implement using hash tables and whatever else you're considering, put your actual data through, and measure which performs better.

That said, if the hash tables has the operations you need (i.e. you're not expecting to iterate it in sorted order, or compare it quickly to another hash table), and has millions or more (billions, trillions...) of elements, then it'll probably be your best choice, but a lot depends on the hash table implementation (especially the choice of closed vs. open hashing), object size, hash function quality and calculation cost / runtime), comparison cost, oddities of your computers memory performance at different cache levels... in short: too many things to make even an educated guess a better choice than measuring, when it matters.

and what are the cases when using hash tables are not applicable?

Mainly when:

  • The input can't be hashed (e.g. you're given binary blobs and don't know which bits in there are significant, but you do have an int cmp(const T&, const T&) function you could use for a std::map), or

  • the available/possible hash functions are very collision prone, or

  • you want to avoid worst-case performance hits for:

    • handling lots of hash-colliding elements (perhaps "engineered" by someone trying to crash or slow down your software)

    • resizing the hash table: unless presized to be large enough (which can be wasteful and slow when excessive memory's used), the majority of implementations will outgrow the arrays they're using for the hash table every now and then, then allocate a bigger array and copy content across: this can make the specific insertions that cause this rehashing to be much slower than the normal O(1) behaviour, even though the average is still O(1); if you need more consistent behaviour in all cases, something like a balance binary tree may serve

  • your access patterns are quite specialised (e.g. frequently operating on elements with keys that are "nearby" in some specific sort order), such that cache efficiency is better for other storage models that keep them nearby in memory (e.g. bucket sorted elements), even if you're not exactly relying on the sort order for e.g. iteration

Solution 2

We use Hash Tables to get access time of O(1). Imagine a dictionary. When you are looking for a word, eg "happy", you jump straight to 'H'. Here the hash function is determined by the starting alphabet. And then you look for happy within the H bucket (actually H bucket then HA bucket then HAP bucket anbd so on).

It doesn't make sense to use Hash Tables when your data is ordered or needs ordering like sorted numbers. (Alphabets are ordered ABCD....XYZ but it wouldn't matter if you switched A and Z, provided you know it is switched in your dictionary.)

Share:
10,742

Related videos on Youtube

Sullivan Risk
Author by

Sullivan Risk

Software Engineer

Updated on June 04, 2022

Comments

  • Sullivan Risk
    Sullivan Risk almost 2 years

    What are the cases when using hash table can improve performance, and when it does not? and what are the cases when using hash tables are not applicable?

    • Jim Mischel
      Jim Mischel about 8 years
      That topic is way too broad to cover here. Start with en.wikipedia.org/wiki/Hash_table#Uses, and if you have specific questions after that, then post a new question.
    • Sullivan Risk
      Sullivan Risk about 8 years
      Thanks a lot, Jim.
  • Sullivan Risk
    Sullivan Risk about 8 years
    Thanks a lot for your answer, Tony :)
  • Sullivan Risk
    Sullivan Risk about 8 years
    Thanks a lot for your answer, feltspar