Have a good hash function for a C++ hash table?

45,797

Solution 1

Now assumming you want a hash, and want something blazing fast that would work in your case, because your strings are just 6 chars long you could use this magic:

size_t precision = 2; //change the precision with this
size_t hash(const char* str)
{
   return (*(size_t*)str)>> precision;
}

CRC is for slowpokes ;)

Explanation: This works by casting the contents of the string pointer to "look like" a size_t (int32 or int64 based on the optimal match for your hardware). So the contents of the string are interpreted as a raw number, no worries about characters anymore, and you then bit-shift this the precision needed (you tweak this number to the best performance, I've found 2 works well for hashing strings in set of a few thousands).

Also the really neat part is any decent compiler on modern hardware will hash a string like this in 1 assembly instruction, hard to beat that ;)

Solution 2

This simple polynomial works surprisingly well. I got it from Paul Larson of Microsoft Research who studied a wide variety of hash functions and hash multipliers.

unsigned hash(const char* s, unsigned salt)
{
    unsigned h = salt;
    while (*s)
        h = h * 101 + (unsigned) *s++;
    return h;
}

salt should be initialized to some randomly chosen value before the hashtable is created to defend against hash table attacks. If this isn't an issue for you, just use 0.

The size of the table is important too, to minimize collisions. Sounds like yours is fine.

Solution 3

Boost.Functional/Hash might be of use to you. I've not tried it, so I can't vouch for its performance.

Boost also has a CRC library.

I would look a Boost.Unordered first (i.e. boost::unordered_map<>). It uses hash maps instead of binary trees for containers.

I believe some STL implementations have a hash_map<> container in the stdext namespace.

Solution 4

Since you store english words, most of your characters will be letters and there won't be much variation in the most significant two bits of your data. Besides of that I would keep it very simple, just using XOR. After all you're not looking for cryptographic strength but just for a reasonably even distribution. Something along these lines:

size_t hash(const std::string &data) {
  size_t h(0);
  for (int i=0; i<data.length(); i++)
    h = (h << 6) ^ (h >> 26) ^ data[i];
  }
  return h;
}

Besides of that, have you looked at std::tr1::hash as a hashing function and/or std::tr1::unordered_map as an implementation of a hash table? Using these would probably be save much work opposed to implementing your own classes.

Solution 5

The size of your table will dictate what size hash you should use. You would like to minimize collisions of course. I'm not sure what you are specifying by max items and capacity (they seem like the same thing to me) In any case either of those numbers suggest that a 32 bit hash would be sufficient. You might get away with CRC16 (~65,000 possibilities) but you would probably have a lot of collisions to deal with. On the other hand, a collision may be quicker to deal with than than a CRC32 hash.

I would say, go with CRC32. You'll find no shortage of documentation and sample code. Since you have your maximums figured out and speed is a priority, go with an array of pointers. Use the hash to generate an index. On collision, increment index until you hit an empty bucket.. quick and simple.

Share:
45,797
JxAxMxIxN
Author by

JxAxMxIxN

Updated on February 13, 2022

Comments

  • JxAxMxIxN
    JxAxMxIxN over 2 years

    I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking what's a good hash function "in general". I've considered CRC32 (but where to find good implementation?) and a few cryptography algorithms. My table, though, has very specific requirements.

    Here's what the table will be like:

    100,000 items max
    200,000 capacity (so the load is 0.5)
    hashing a 6-character string which is a part of English sentence
         examples: "become"    "and he"    ", not "
    

    The number one priority of my hash table is quick search (retrieval). Quick insertion is not important, but it will come along with quick search. Deletion is not important, and re-hashing is not something I'll be looking into. To handle collisions, I'll be probably using separate chaining as described here. I have already looked at this article, but would like an opinion of those who have handled such task before.