A fast hash function for string in C#

50,019

Solution 1

static UInt64 CalculateHash(string read)
{
    UInt64 hashedValue = 3074457345618258791ul;
    for(int i=0; i<read.Length; i++)
    {
        hashedValue += read[i];
        hashedValue *= 3074457345618258799ul;
    }
    return hashedValue;
}

This is a Knuth hash. You can also use Jenkins.

Solution 2

First of all, consider using GetHashCode().

A simple improvement on your existing implementation:

static UInt64 CalculateHash(string read, bool lowTolerance)
{
    UInt64 hashedValue = 0;
    int i = 0;
    ulong multiplier = 1;
    while (i < read.Length)
    {
        hashedValue += read[i] * multiplier;
        multiplier *= 37;
        if (lowTolerance) i += 2;
        else i++;
    }
    return hashedValue;
}

It avoids the expensive floating point calculation, and the overhead of ElementAt.

Btw (UInt64)Math.Pow(31, i) doesn't work well for longer strings. Floating point rounding will lead to a multiplier of 0 for characters beyond 15 or so.

Solution 3

To speed up your implementation, the (UInt64)Math.Pow(31, i) call should be replaced by a lookup: pre-calculate a table of the first 30 powers of 31, and use it at runtime. Since the limit on length is 30, you need only 31 element:

private static unsigned long[] Pow31 = new unsigned long[31];

static HashCalc() {
    Pow31[0] = 1;
    for (int i = 1 ; i != Pow31.Length ; i++) {
        Pow31[i] = 31*Pow31[i-1];
    }
}

// In your hash function...
hashedValue += read.ElementAt(i) * Pow31[i];
Share:
50,019
P basak
Author by

P basak

Updated on July 27, 2022

Comments

  • P basak
    P basak almost 2 years

    I want to hash a string of length up-to 30. What will be the best idea to do that if time is my concern. The function will be called over 100 million times. currently I am using the following code,

    static UInt64 CalculateHash(string read, bool lowTolerance)
    {
        UInt64 hashedValue = 0;
        int i = 0;
        while (i < read.Length)
        {
            hashedValue += read.ElementAt(i) * (UInt64)Math.Pow(31, i);
            if (lowTolerance) i += 2;
            else i++;
        }
        return hashedValue;
    }
    
  • David Schwartz
    David Schwartz about 12 years
    The multiplier must start at a prime value greater than 256 or this breaks horribly if the first byte is small.
  • CodesInChaos
    CodesInChaos about 12 years
    @DavidSchwartz A larger prime is certainly better, but breaking horribly is a bit of an overstatement.
  • David Schwartz
    David Schwartz about 12 years
    If a 64-bit hash function has numerous 2-byte inputs that collide, IMO it breaks horribly. (But given how awful the function the OP started with, maybe my standards are too high.)
  • CodesInChaos
    CodesInChaos about 12 years
    Even with a prime >256 but <65536 there will be two char collisions. C# works on UTF-16 codepoints, not single byte chars.
  • CodesInChaos
    CodesInChaos about 12 years
    I wouldn't be so sure that a table lookup is faster than an integer multiplication.
  • Sergey Kalinichenko
    Sergey Kalinichenko about 12 years
    @CodeInChaos It's certainly faster than Math.Pow(31, i). Also I'd need a additional multiplication when i goes up by 2 inside a condition, so I'd try the lookup first.
  • Fantius
    Fantius almost 12 years
    According to my own test, this function does not achieve avalanche. YMMV.
  • David Schwartz
    David Schwartz almost 12 years
    @Fantius: Can you try using 11400714819306691477ul instead, please. (For both values.)
  • Fantius
    Fantius almost 12 years
    It's worse. But I should quantify my original statement. Toggling a single bit on the input results in about 49.40% of the output bits toggling (using your original constant), which is MUCH better than Bernstein-based functions. That's probably good enough for most uses. But, for instance, SuperFastHash (landman-code.blogspot.com/2009/02/…) is giving me 50.02%. And Murmur2 on the same page is giving me 50.04%.
  • David Schwartz
    David Schwartz almost 12 years
    It's not intended for applications where you care about that. It's just intended to be used to distribute strings in a hash table.
  • Shital Shah
    Shital Shah almost 9 years
    Could you please provide citation for this algorithm? I searched TAOCP Vol 3 but can't find these constants you have.
  • David Schwartz
    David Schwartz almost 9 years
    @ShitalShah I don't think Knuth gave the constants for a 64-bit version.
  • Shital Shah
    Shital Shah almost 9 years
    Is it from TAOCP Vol 3? Any reference to anything else where this algorithm came from?
  • David Schwartz
    David Schwartz almost 9 years
    @ShitalShah I'm pretty sure it's from TAOCP, but I'm not sure which volume.
  • Alex from Jitbit
    Alex from Jitbit over 2 years
    Just a warning for everyone who's on .NET Core: in .NET Core GetHashCode is randmized between app restarts! Which means you get a different hash for the same string every time the app is restarted/recycled