Fixed Length numeric hash code from variable length string in c#

19,296

Solution 1

I assume you are doing this because you need to store the value elsewhere and compare against it. Thus Zach's answer (while entirely correct) may cause you issues since the contract for String.GetHashCode() is explicit about its scope for changing.

Thus here is a fixed and easily repeatable in other languages version.

I assume you will know at compile time the number of decimal digits available. This is based on the Jenkins One At a Time Hash (as implemented and exhaustively tested by Bret Mulvey), as such it has excellent avalanching behaviour (a change of one bit in the input propagates to all bits of the output) which means the somewhat lazy modulo reduction in bits at the end is not a serious flaw for most uses (though you could do better with more complex behaviour)

const int MUST_BE_LESS_THAN = 100000000; // 8 decimal digits

public int GetStableHash(string s)
{
    uint hash = 0;
    // if you care this can be done much faster with unsafe 
    // using fixed char* reinterpreted as a byte*
    foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
    {   
        hash += b;
        hash += (hash << 10);
        hash ^= (hash >> 6);    
    }
    // final avalanche
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    // helpfully we only want positive integer < MUST_BE_LESS_THAN
    // so simple truncate cast is ok if not perfect
    return (int)(hash % MUST_BE_LESS_THAN);
}

Solution 2

Simple approach (note that this is platform-dependent):

int shorthash = "test".GetHashCode() % 100000000; // 8 zeros
if (shorthash < 0) shorthash *= -1;
Share:
19,296
Kishore A
Author by

Kishore A

Updated on June 15, 2022

Comments

  • Kishore A
    Kishore A about 2 years

    I need to store fixed-length (up to 8 digits) numbers produced from a variable length strings. The hash need not be unique. It just needs to change when input string changes. Is there a hash function in .Net that does this?

    Thanks
    Kishore.

  • joshperry
    joshperry over 15 years
    This will not render the same value for two different strings with the same contents
  • Zach Scrivena
    Zach Scrivena over 15 years
    @joshperry: Thanks, I've added a disclaimer in the answer.
  • Marc Gravell
    Marc Gravell over 15 years
    @joshperry - er, yes it will.. it just isn't guaranteed to remain the same between .NET versions. However, no hash can guarantee to change when the input text changes - collisions, although unlikely, will happen (very, very, very rarely).
  • Tim Friesen
    Tim Friesen over 6 years
    Code is missing a semi-colon on last line of code. Tried to edit but SO requires 6 changes.
  • Mahdi
    Mahdi over 5 years
    not addressing the OP requirement but this is the most reliable solution without any collision..
  • Sparr
    Sparr over 5 years
    @Luckylukee why doesn't this address the OP requirment?