What's the best hashing algorithm to use on a stl string when using hash_map?
Solution 1
I worked with Paul Larson of Microsoft Research on some hashtable implementations. He investigated a number of string hashing functions on a variety of datasets and found that a simple multiply by 101 and add loop worked surprisingly well.
unsigned int
hash(
const char* s,
unsigned int seed = 0)
{
unsigned int hash = seed;
while (*s)
{
hash = hash * 101 + *s++;
}
return hash;
}
Solution 2
From some old code of mine:
/* magic numbers from http://www.isthe.com/chongo/tech/comp/fnv/ */
static const size_t InitialFNV = 2166136261U;
static const size_t FNVMultiple = 16777619;
/* Fowler / Noll / Vo (FNV) Hash */
size_t myhash(const string &s)
{
size_t hash = InitialFNV;
for(size_t i = 0; i < s.length(); i++)
{
hash = hash ^ (s[i]); /* xor the low 8 bits */
hash = hash * FNVMultiple; /* multiply by the magic number */
}
return hash;
}
Its fast. Really freaking fast.
Solution 3
Boost has an boost::hash library which can provides some basic hash functions for most common types.
Solution 4
That always depends on your data-set.
I for one had surprisingly good results by using the CRC32 of the string. Works very good with a wide range of different input sets.
Lots of good CRC32 implementations are easy to find on the net.
Edit: Almost forgot: This page has a nice hash-function shootout with performance numbers and test-data:
http://smallcode.weblogs.us/ <-- further down the page.
Solution 5
If you are hashing a fixed set of words, the best hash function is often a perfect hash function. However, they generally require that the set of words you are trying to hash is known at compile time. Detection of keywords in a lexer (and translation of keywords to tokens) is a common usage of perfect hash functions generated with tools such as gperf. A perfect hash also lets you replace hash_map
with a simple array or vector
.
If you're not hashing a fixed set of words, then obviously this doesn't apply.
PiNoYBoY82
Updated on July 09, 2022Comments
-
PiNoYBoY82 almost 2 years
I've found the standard hashing function on VS2005 is painfully slow when trying to achieve high performance look ups. What are some good examples of fast and efficient hashing algorithms that should void most collisions?
-
Nils Pipenbrinck over 15 yearsHey George. I tried the code in the hash-benchmark I've postet in my answer. Nice find. It does not excel in performance or collisions, but it always gives consistent results. Seems like it's a good and cheap candidate for general purpose string-hashing.
-
Admin about 14 yearsit may be fast, but its probably one of the WORST hash functions ever invented.
-
Albert over 11 years@Matthieu: Why? Many duplicates? Do you have any references where I can read more about it?
-
Soumajyoti over 11 yearsBut this works only for small length strings. For large cases, it overflows majority of the time.
-
George V. Reilly over 11 yearsSoumajyoti, the overflow doesn't matter. Most hash functions overflow. The point is that you get a decent mix of bits in the low-order 32-bits.
-
Jorge Galvão about 10 yearsThat resembles the Java implementation, but it uses 31 instead of 101.
-
Helge Klein over 9 yearsYoda condition alert! Apart from that this is similar to the Larson algorithm (I noticed this was posted earlier!).
-
David Soroko about 9 yearsI run some benchmarks and SipHash looks very good
-
Mooing Duck almost 9 years@Albert:
^
is transitive, which is bad.FNVMultiple
is not prime, which is bad.InitialFNV
isn't prime either, which may or may not be bad, I'm uncertain. -
Soren over 8 yearsAlso look at the spooky hash which is an improvement on Jenkins
-
philix over 7 years@FelixSFD I just improved the answer.
-
bysreg almost 7 years@MooingDuck FNVMultiple seems to be a prime number.
-
Nick about 6 years16777619 is a (proven) prime. 2166136261 is (proven) composite (failed sprp test base 2). primes.utm.edu/curios/includes/primetest.php
-
Chitturi Sai Suman over 2 yearsHow can we find the hash of substring, given we have already calculated the hashes of all prefixes of given String? Example, consider
s = "hello"
, the hashes of all prefixes of s are:[104, 10605, 1071213, 108192621, 2337520240]
How to quickly find hash of substringhel
?