A fast hash function for string in C#
Solution 1
static UInt64 CalculateHash(string read)
{
UInt64 hashedValue = 3074457345618258791ul;
for(int i=0; i<read.Length; i++)
{
hashedValue += read[i];
hashedValue *= 3074457345618258799ul;
}
return hashedValue;
}
This is a Knuth hash. You can also use Jenkins.
Solution 2
First of all, consider using GetHashCode()
.
A simple improvement on your existing implementation:
static UInt64 CalculateHash(string read, bool lowTolerance)
{
UInt64 hashedValue = 0;
int i = 0;
ulong multiplier = 1;
while (i < read.Length)
{
hashedValue += read[i] * multiplier;
multiplier *= 37;
if (lowTolerance) i += 2;
else i++;
}
return hashedValue;
}
It avoids the expensive floating point calculation, and the overhead of ElementAt
.
Btw (UInt64)Math.Pow(31, i)
doesn't work well for longer strings. Floating point rounding will lead to a multiplier of 0 for characters beyond 15 or so.
Solution 3
To speed up your implementation, the (UInt64)Math.Pow(31, i)
call should be replaced by a lookup: pre-calculate a table of the first 30 powers of 31
, and use it at runtime. Since the limit on length is 30, you need only 31 element:
private static unsigned long[] Pow31 = new unsigned long[31];
static HashCalc() {
Pow31[0] = 1;
for (int i = 1 ; i != Pow31.Length ; i++) {
Pow31[i] = 31*Pow31[i-1];
}
}
// In your hash function...
hashedValue += read.ElementAt(i) * Pow31[i];
P basak
Updated on July 27, 2022Comments
-
P basak almost 2 years
I want to hash a string of length up-to 30. What will be the best idea to do that if time is my concern. The function will be called over 100 million times. currently I am using the following code,
static UInt64 CalculateHash(string read, bool lowTolerance) { UInt64 hashedValue = 0; int i = 0; while (i < read.Length) { hashedValue += read.ElementAt(i) * (UInt64)Math.Pow(31, i); if (lowTolerance) i += 2; else i++; } return hashedValue; }
-
David Schwartz about 12 yearsThe multiplier must start at a prime value greater than 256 or this breaks horribly if the first byte is small.
-
CodesInChaos about 12 years@DavidSchwartz A larger prime is certainly better, but breaking horribly is a bit of an overstatement.
-
David Schwartz about 12 yearsIf a 64-bit hash function has numerous 2-byte inputs that collide, IMO it breaks horribly. (But given how awful the function the OP started with, maybe my standards are too high.)
-
CodesInChaos about 12 yearsEven with a prime >256 but <65536 there will be two char collisions. C# works on UTF-16 codepoints, not single byte chars.
-
CodesInChaos about 12 yearsI wouldn't be so sure that a table lookup is faster than an integer multiplication.
-
Sergey Kalinichenko about 12 years@CodeInChaos It's certainly faster than
Math.Pow(31, i)
. Also I'd need a additional multiplication wheni
goes up by 2 inside a condition, so I'd try the lookup first. -
Fantius almost 12 yearsAccording to my own test, this function does not achieve avalanche. YMMV.
-
David Schwartz almost 12 years@Fantius: Can you try using
11400714819306691477ul
instead, please. (For both values.) -
Fantius almost 12 yearsIt's worse. But I should quantify my original statement. Toggling a single bit on the input results in about 49.40% of the output bits toggling (using your original constant), which is MUCH better than Bernstein-based functions. That's probably good enough for most uses. But, for instance, SuperFastHash (landman-code.blogspot.com/2009/02/…) is giving me 50.02%. And Murmur2 on the same page is giving me 50.04%.
-
David Schwartz almost 12 yearsIt's not intended for applications where you care about that. It's just intended to be used to distribute strings in a hash table.
-
Shital Shah almost 9 yearsCould you please provide citation for this algorithm? I searched TAOCP Vol 3 but can't find these constants you have.
-
David Schwartz almost 9 years@ShitalShah I don't think Knuth gave the constants for a 64-bit version.
-
Shital Shah almost 9 yearsIs it from TAOCP Vol 3? Any reference to anything else where this algorithm came from?
-
David Schwartz almost 9 years@ShitalShah I'm pretty sure it's from TAOCP, but I'm not sure which volume.
-
Alex from Jitbit over 2 yearsJust a warning for everyone who's on .NET Core: in .NET Core
GetHashCode
is randmized between app restarts! Which means you get a different hash for the same string every time the app is restarted/recycled