Concise way to combine field hashcodes?

13,363

Solution 1

Some people use:

Tuple.Create(lastName, firstName, gender).GetHashCode()

It's mentioned on MSDN at Object.GetHashCode(), with the warning:

Note, though, that the performance overhead of instantiating a Tuple object may significantly impact the overall performance of an application that stores large numbers of objects in hash tables.

The logic of aggregating the constituent hashes is provided by System.Tuple, which hopefully has had some thought go into it...

Update: it is worth noting @Ryan's observation in the comments that this only appears to use the last 8 elements of any Tuple of Size>8.

Solution 2

EDIT: System.HashCode has now been released. The recommended way of creating hashcodes is now this:

public override int GetHashCode()
{
    return HashCode.Combine(fieldA, fieldB, fieldC);
}

System.HashCode.Combine() will internally call .GetHashCode() on each field, and do the right thing automatically.

For very many fields (more than 8), you can create an instance of HashCode and then use the .Add() method:

public override int GetHashCode()
{
    HashCode hash = new HashCode();
    hash.Add(fieldA);
    hash.Add(fieldB);
    hash.Add(fieldC);
    hash.Add(fieldD);
    hash.Add(fieldE);
    hash.Add(fieldF);
    hash.Add(fieldG);
    hash.Add(fieldH);
    hash.Add(fieldI);
    return hash.ToHashCode();
}

Visual Studio 2019 now has a Quick Actions helper to generate Equals() and GetHashCode() for you. Simply right click the class name in the declaration > Quick Actions and Refactorings > Generate Equals and GetHashCode. Select the members you want it to use for equality, and also "Implement IEquatable", and then click OK.

One last thing: If you need to get the structural hash code of an object, for example if you want to include the hashcode of an array that changes based on its contents (aka structure) and not its reference, then you will need to cast the field to IStructuralEquatable and get its hash code manually, like so:

public override int GetHashCode()
{
    return HashCode.Combine(
        fieldA,
        ((IStructuralEquatable)stringArrayFieldB).GetHashCode(EqualityComparer<string>.Default));
}

This is because the IStructuralEquatable interface is almost always implemented explicitly, so casting to IStructuralEquatable is required to call IStructuralEquatable.GetHashCode() instead of the default object.GetHashCode() method.

Finally, in the current implementation the .GetHashCode of an int is simply the integer value itself, so passing in the hashcode value to HashCode.Combine() instead of the field itself makes no difference to the result.

Old Answer:

For the sake of completeness, here is the hashing algorithm taken from the .NET Tuple Reference source, line 52. Interestingly, this hash algorithm was copied over from System.Web.Util.HashCodeCombiner.

Here is the code:

public override int GetHashCode() {
    // hashing method taken from .NET Tuple reference
    // expand this out to however many items you need to hash
    return CombineHashCodes(this.item1.GetHashCode(), this.item2.GetHashCode(), this.item3.GetHashCode());
}

internal static int CombineHashCodes(int h1, int h2) {
    // this is where the magic happens
    return (((h1 << 5) + h1) ^ h2);
}

internal static int CombineHashCodes(int h1, int h2, int h3) {
    return CombineHashCodes(CombineHashCodes(h1, h2), h3);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4) {
    return CombineHashCodes(CombineHashCodes(h1, h2), CombineHashCodes(h3, h4));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), h5);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7, int h8) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7, h8));
}

Of course, the actual Tuple GetHashCode() (which is actually an Int32 IStructuralEquatable.GetHashCode(IEqualityComparer comparer)) has a large switch block to decide which one of these to call based upon how many items it is holding - your own code probably won't require that.

Solution 3

It's not exactly the same, but we have a HashCodeHelper class in Noda Time (which has lots of types which override equality and hash code operations).

It's used like this (taken from ZonedDateTime):

public override int GetHashCode()
{
    int hash = HashCodeHelper.Initialize();
    hash = HashCodeHelper.Hash(hash, LocalInstant);
    hash = HashCodeHelper.Hash(hash, Offset);
    hash = HashCodeHelper.Hash(hash, Zone);
    return hash;
}

Note that it's a generic method, which avoids boxing for value types. It copes with null values automatically (using 0 for the value). Note that the MakeHash method has an unchecked block as Noda Time uses checked arithmetic as a project setting, whereas hash code calculations should be allowed to overflow.

Solution 4

Here are a couple of concise (though not as efficient) refactors of the System.Web.Util.HashCodeCombiner mentioned in Ryan's answer

    public static int CombineHashCodes(params object[] objects)
    {
        // From System.Web.Util.HashCodeCombiner
        int combine(int h1, int h2) => (((h1 << 5) + h1) ^ h2);

        return objects.Select(it => it.GetHashCode()).Aggregate(5381,combine);
    }

    public static int CombineHashCodes(IEqualityComparer comparer, params object[] objects)
    {
        // From System.Web.Util.HashCodeCombiner
        int combine(int h1, int h2) => (((h1 << 5) + h1) ^ h2);

        return objects.Select(comparer.GetHashCode).Aggregate(5381, combine);
    }
Share:
13,363
bacar
Author by

bacar

An experienced &amp; analytical software development leader, with polyglot hand-on development skills, expertise in design/architecture of maintainable, scalable systems, and exotics equity derivatives pricing &amp; risk systems. Extensive experience with user-facing systems in investment banks, working closely with quants, and on regulatory projects. A strong record in driving DevOps &amp; agile, improving development processes and self-improvement. https://www.linkedin.com/in/barisacar/

Updated on June 04, 2022

Comments

  • bacar
    bacar about 2 years

    One if the ways to implement GetHashCode - where it's required to do so - is outlined by Jon Skeet here. Repeating his code:

    public override int GetHashCode()
    {
        unchecked // Overflow is fine, just wrap
        {
            int hash = 17;
            // Suitable nullity checks etc, of course :)
            hash = hash * 23 + field1.GetHashCode();
            hash = hash * 23 + field2.GetHashCode();
            hash = hash * 23 + field3.GetHashCode();
            return hash;
        }
    }
    

    Rolling this code by hand can be error-prone and bugs can be subtle/hard to spot (did you swap + and * by mistake?), it can be hard to remember the combination rules for different types, and I don't like expending mental effort on writing/reviewing the same thing over and over again for different fields and classes. It can also obfuscate one of the most important details (did I remember to include all the fields?) in repetitive noise.

    Is there a concise way to combine field hashcodes using the .net library?. Obviously I could write my own, but if there's something idiomatic/built-in I'd prefer that.

    As an example, in Java (using JDK7) I can achieve the above using:

       @Override
       public int hashCode()  
       {  
          return Objects.hash(field1, field2, field3);  
       }  
    

    This really helps to eliminate bugs and focus in the important details.

    Motivation: I came across a C# class which requires an overridden GetHashCode(), but the way it combined the hashcodes of its various constituents had some severe bugs. A library function for combining the hashcodes would be useful for avoiding such bugs.

  • Admin
    Admin almost 11 years
    Hmm, I'd wager this probably does pretty well. How big can n be in tuple I wonder? I doubt the 4 lines of code it takes to implement the Java style solution is a big deal, but I guess I can understand the desire for a standard well understood solution.
  • evanmcdonnal
    evanmcdonnal almost 11 years
    @ebyrob the largest in C# is an 8 tuple.
  • bacar
    bacar almost 11 years
    does Tuple.Create(first, second, third, fourth, fifth, sixth, seventh, Tuple.Create(eight, ninth, tenth, ...)).GetHashCode() deal with that?
  • Servy
    Servy almost 11 years
    @bacar Yes, but it's not terribly efficient, and hash code generation ought to be an efficient operation. The method the OP is describing is also easy enough to implement properly.
  • bacar
    bacar almost 11 years
    @Servy You'd think so, huh? I did come across a buggy implementation, hence the motivation. There are loads of bugs you can accidentally put in here - swapping the addition/multiplication, poor choice of multiplier, forgetting the addition part entirely... I've seen them happen and the worst part is that they kinda look similar to the 'standard' solution and end up getting past a code review too. I believe you should roll your own where it addresses a bottleneck, but minimize the cognitive burden for maintainers where it isn't.
  • Servy
    Servy almost 11 years
    @bacar So then take the extra 10-20 minutes to really look closely (and test) at those four lines of code instead of just the 30 seconds to glance and make sure it's mostly right. The basic algorithms are so widely used you have enough trustworthy sources to compare with.
  • bacar
    bacar almost 11 years
    @Servy - I'm very surprised by that. You appear to be arguing against libraries in general. Why would I want to take 10-20 minutes every time rolling my own instead of using a known correct library to do something? This feels like NIH Syndrome?
  • Servy
    Servy almost 11 years
    @bacar You don't need to do it every time. You need to do it once in your life. It would take you less time to do that then to post this question on SO. It's such a simple operation that it would appear people haven't bothered to write a library to do it. If you feel that there is a strong need out there for a library that does this, feel free to publish one yourself.
  • Federico Berasategui
    Federico Berasategui almost 11 years
    +1. Though I still get the impression that you only need this in those special cases, not for regular daily public class Person, public class BillingRepository, etc.
  • Jon Skeet
    Jon Skeet almost 11 years
    @HighCore: What may seem like a special case can be day-to-day work for other people. Not everyone write the same kind of code.
  • bacar
    bacar over 10 years
    This generates a very poor hash with many collisions; {Field1="foo",Field2="bar"} generates the same hash as {Field1="bar",Field2="foo"}. Plus, using an or rather than an Xor could be seen as especially bad - the more fields you have the more likely that your hash is just equal to 0xFFFFFFFF - in fact if Field1.GetHashCode()= -1 = 0xFFFFFFFF, it doesn't matter what the other fields are, they will all have a hashcode of 0xFFFFFF.
  • bacar
    bacar over 10 years
    And note that the int32's GetHashCode just returns the value itself. So if Field1 is -1, all your other fields would be redundant by this implementation.
  • Mitja
    Mitja almost 10 years
    Even return (this.Field1.GetHashCode().ToString() + this.Field2.GetHashCode().ToString()).GetHashCode(); would make a better implementation. ;-)
  • Jon Skeet
    Jon Skeet almost 10 years
    @automatonic: No more, I'm afraid... fixed now.
  • angularsen
    angularsen over 9 years
    Nice one. I found this addition useful: internal static int HashAll(params object[] values) { int initialHash = Initialize(); return values.Aggregate(initialHash, Hash); }
  • Jon Skeet
    Jon Skeet over 9 years
    @anjdreas: Right, but that means a) creating an array each time; b) boxing value types.
  • angularsen
    angularsen over 9 years
    Good point. Are both a) and b) about overhead and space, or do boxing also ruing the hash code generation?
  • Jon Skeet
    Jon Skeet over 9 years
    @anjdreas: Both about overhead, basically. Boxing shouldn't affect hash codes.
  • Ryan
    Ryan about 8 years
    Just to add to this, the reference source implementation of the Tuple class uses this as the hashing method, line 53. It basically goes like this: If the Tuple has more than 8 elements, only hash the last 8 elements. From the bottom up, (((h1 << 5) + h1) ^ h2) is called on each two elements to be hashed. Then, each two results of those hashes are hashed, until one hash is left at the end. This can be quite expensive for large Tuples.
  • bacar
    bacar about 8 years
    I don't see why that's particularly expensive. For hashing n elements you will always need n hash operations and (n-1) hash combining operations. Good spot that it only hashes the last 8 elements, though.
  • Gyum Fox
    Gyum Fox over 6 years
    Please note that HashCodeCombiner starts with a seed of 5381
  • Jonathan Dickinson
    Jonathan Dickinson about 6 years
    "It will also be used under the hood by System.Tuple and other immutable composite types." It's now in netcore 2.1. Note that the BCL (Tuple etc.) doesn't use it just yet, because I had massive problems getting it to work under netfx - that will probably only come with/after the next version of netfx.