What's the best hashing algorithm to use on a stl string when using hash_map?

51,805

Solution 1

I worked with Paul Larson of Microsoft Research on some hashtable implementations. He investigated a number of string hashing functions on a variety of datasets and found that a simple multiply by 101 and add loop worked surprisingly well.

unsigned int
hash(
    const char* s,
    unsigned int seed = 0)
{
    unsigned int hash = seed;
    while (*s)
    {
        hash = hash * 101  +  *s++;
    }
    return hash;
}

Solution 2

From some old code of mine:

/* magic numbers from http://www.isthe.com/chongo/tech/comp/fnv/ */
static const size_t InitialFNV = 2166136261U;
static const size_t FNVMultiple = 16777619;

/* Fowler / Noll / Vo (FNV) Hash */
size_t myhash(const string &s)
{
    size_t hash = InitialFNV;
    for(size_t i = 0; i < s.length(); i++)
    {
        hash = hash ^ (s[i]);       /* xor  the low 8 bits */
        hash = hash * FNVMultiple;  /* multiply by the magic number */
    }
    return hash;
}

Its fast. Really freaking fast.

Solution 3

Boost has an boost::hash library which can provides some basic hash functions for most common types.

Solution 4

That always depends on your data-set.

I for one had surprisingly good results by using the CRC32 of the string. Works very good with a wide range of different input sets.

Lots of good CRC32 implementations are easy to find on the net.

Edit: Almost forgot: This page has a nice hash-function shootout with performance numbers and test-data:

http://smallcode.weblogs.us/ <-- further down the page.

Solution 5

If you are hashing a fixed set of words, the best hash function is often a perfect hash function. However, they generally require that the set of words you are trying to hash is known at compile time. Detection of keywords in a lexer (and translation of keywords to tokens) is a common usage of perfect hash functions generated with tools such as gperf. A perfect hash also lets you replace hash_map with a simple array or vector.

If you're not hashing a fixed set of words, then obviously this doesn't apply.

Share:
51,805
PiNoYBoY82
Author by

PiNoYBoY82

Updated on July 09, 2022

Comments

  • PiNoYBoY82
    PiNoYBoY82 almost 2 years

    I've found the standard hashing function on VS2005 is painfully slow when trying to achieve high performance look ups. What are some good examples of fast and efficient hashing algorithms that should void most collisions?

  • Nils Pipenbrinck
    Nils Pipenbrinck over 15 years
    Hey George. I tried the code in the hash-benchmark I've postet in my answer. Nice find. It does not excel in performance or collisions, but it always gives consistent results. Seems like it's a good and cheap candidate for general purpose string-hashing.
  • Admin
    Admin about 14 years
    it may be fast, but its probably one of the WORST hash functions ever invented.
  • Albert
    Albert over 11 years
    @Matthieu: Why? Many duplicates? Do you have any references where I can read more about it?
  • Soumajyoti
    Soumajyoti over 11 years
    But this works only for small length strings. For large cases, it overflows majority of the time.
  • George V. Reilly
    George V. Reilly over 11 years
    Soumajyoti, the overflow doesn't matter. Most hash functions overflow. The point is that you get a decent mix of bits in the low-order 32-bits.
  • Jorge Galvão
    Jorge Galvão about 10 years
    That resembles the Java implementation, but it uses 31 instead of 101.
  • Helge Klein
    Helge Klein over 9 years
    Yoda condition alert! Apart from that this is similar to the Larson algorithm (I noticed this was posted earlier!).
  • David Soroko
    David Soroko about 9 years
    I run some benchmarks and SipHash looks very good
  • Mooing Duck
    Mooing Duck almost 9 years
    @Albert: ^ is transitive, which is bad. FNVMultiple is not prime, which is bad. InitialFNV isn't prime either, which may or may not be bad, I'm uncertain.
  • Soren
    Soren over 8 years
    Also look at the spooky hash which is an improvement on Jenkins
  • philix
    philix over 7 years
    @FelixSFD I just improved the answer.
  • bysreg
    bysreg almost 7 years
    @MooingDuck FNVMultiple seems to be a prime number.
  • Nick
    Nick about 6 years
    16777619 is a (proven) prime. 2166136261 is (proven) composite (failed sprp test base 2). primes.utm.edu/curios/includes/primetest.php
  • Chitturi Sai Suman
    Chitturi Sai Suman over 2 years
    How can we find the hash of substring, given we have already calculated the hashes of all prefixes of given String? Example, consider s = "hello", the hashes of all prefixes of s are: [104, 10605, 1071213, 108192621, 2337520240] How to quickly find hash of substring hel?