What is the best algorithm for overriding GetHashCode?

263,079

Solution 1

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing

Solution 2

ValueTuple - Update for C# 7

As @cactuaroid mentions in the comments, a value tuple can be used. This saves a few keystrokes and more importantly executes purely on the stack (no Garbage):

(PropA, PropB, PropC, PropD).GetHashCode();

(Note: The original technique using anonymous types seems to create an object on the heap, i.e. garbage, since anonymous types are implemented as classes, though this might be optimized out by the compiler. It would be interesting to benchmark these options, but the tuple option should be superior.)

Anonymous Type (Original Answer)

Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:

new { PropA, PropB, PropC, PropD }.GetHashCode();

This will work for any number of properties. It does not use boxing. It just uses the algorithm already implemented in the framework for anonymous types.

Solution 3

Using System.HashCode

If you are using .NET Standard 2.1 or above, you can use the System.HashCode struct. On earlier frameworks it is available from the Microsoft.Bcl.HashCode package. There are two methods of using it:

HashCode.Combine

The Combine method can be used to create a hash code, given up to eight objects.

public override int GetHashCode() => HashCode.Combine(this.object1, this.object2);

HashCode.Add

The Add method helps you to deal with collections:

public override int GetHashCode()
{
    var hashCode = new HashCode();
    hashCode.Add(this.object1);
    foreach (var item in this.collection)
    {
        hashCode.Add(item);
    }
    return hashCode.ToHashCode();
}

GetHashCode Made Easy

An alternative to System.HashCode that is super easy to use while still being fast. You can read the full blog post 'GetHashCode Made Easy' for more details and comments.

Usage Example

public class SuperHero
{
    public int Age { get; set; }
    public string Name { get; set; }
    public List<string> Powers { get; set; }

    public override int GetHashCode() =>
        HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers);
}

Implementation

public struct HashCode : IEquatable<HashCode>
{
    private const int EmptyCollectionPrimeNumber = 19;
    private readonly int value;

    private HashCode(int value) => this.value = value;

    public static implicit operator int(HashCode hashCode) => hashCode.value;

    public static bool operator ==(HashCode left, HashCode right) => left.Equals(right);

    public static bool operator !=(HashCode left, HashCode right) => !(left == right);

    public static HashCode Of<T>(T item) => new HashCode(GetHashCode(item));

    public static HashCode OfEach<T>(IEnumerable<T> items) =>
        items == null ? new HashCode(0) : new HashCode(GetHashCode(items, 0));

    public HashCode And<T>(T item) => 
        new HashCode(CombineHashCodes(this.value, GetHashCode(item)));

    public HashCode AndEach<T>(IEnumerable<T> items)
    {
        if (items == null)
        {
            return new HashCode(this.value);
        }

        return new HashCode(GetHashCode(items, this.value));
    }

    public bool Equals(HashCode other) => this.value.Equals(other.value);

    public override bool Equals(object obj)
    {
        if (obj is HashCode)
        {
            return this.Equals((HashCode)obj);
        }

        return false;
    }

    public override int GetHashCode() => this.value.GetHashCode();

    private static int CombineHashCodes(int h1, int h2)
    {
        unchecked
        {
            // Code copied from System.Tuple a good way to combine hashes.
            return ((h1 << 5) + h1) ^ h2;
        }
    }

    private static int GetHashCode<T>(T item) => item?.GetHashCode() ?? 0;

    private static int GetHashCode<T>(IEnumerable<T> items, int startHashCode)
    {
        var temp = startHashCode;

        var enumerator = items.GetEnumerator();
        if (enumerator.MoveNext())
        {
            temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));

            while (enumerator.MoveNext())
            {
                temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
            }
        }
        else
        {
            temp = CombineHashCodes(temp, EmptyCollectionPrimeNumber);
        }

        return temp;
    }
}

What Makes a Good Algorithm?

Performance

The algorithm that calculates a hash code needs to be fast. A simple algorithm is usually going to be a faster one. One that does not allocate extra memory will also reduce need for garbage collection, which will in turn also improve performance.

In C# hash functions specifically, you often use the unchecked keyword which stops overflow checking to improve performance.

Deterministic

The hashing algorithm needs to be deterministic i.e. given the same input it must always produce the same output.

Reduce Collisions

The algorithm that calculates a hash code needs to keep hash collisions to a minumum. A hash collision is a situation that occurs when two calls to GetHashCode on two different objects produce identical hash codes. Note that collisions are allowed (some have the misconceptions that they are not) but they should be kept to a minimum.

A lot of hash functions contain magic numbers like 17 or 23. These are special prime numbers which due to their mathematical properties help to reduce hash collisions as compared to using non-prime numbers.

Hash Uniformity

A good hash function should map the expected inputs as evenly as possible over its output range i.e. it should output a wide range of hashes based on its inputs that are evenly spread. It should have hash uniformity.

Prevent's DoS

In .NET Core each time you restart an application you will get different hash codes. This is a security feature to prevent Denial of Service attacks (DoS). For .NET Framework you should enable this feature by adding the following App.config file:

<?xml version ="1.0"?>  
<configuration>  
   <runtime>  
      <UseRandomizedStringHashAlgorithm enabled="1" />  
   </runtime>  
</configuration>

Because of this feature, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection and they should never be persisted.

Read more about this here.

Cryptographically Secure?

The algorithm does not have to be a Cryptographic hash function. Meaning it does not have to satisfy the following conditions:

  • It is infeasible to generate a message that yields a given hash value.
  • It is infeasible to find two different messages with the same hash value.
  • A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value (avalanche effect).

Solution 4

Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

Also it has extension method to provide a fluent interface, so you can use it like this:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

or like this:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}

Solution 5

I have a Hashing class in Helper library that I use it for this purpose.

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

Then, simply you can use it as:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

I didn't assess its performance, so any feedback is welcomed.

Share:
263,079
Erusso87
Author by

Erusso87

Updated on May 13, 2022

Comments

  • Erusso87
    Erusso87 almost 2 years

    In .NET, the GetHashCode method is used in a lot of places throughout the .NET base class libraries. Implementing it properly is especially important to find items quickly in a collection or when determining equality.

    Is there a standard algorithm or best practice on how to implement GetHashCode for my custom classes so I don't degrade performance?

    • rene
      rene about 12 years
      After reading this question and the article below, i could implement override of GetHashCode. I hope it would be helpful for others. Guidelines and rules for GetHashCode written by Eric Lippert
    • Thomas Levesque
      Thomas Levesque over 8 years
      "or to determine equality": no! Two objects with the same hashcode are not necessarily equal.
    • Erusso87
      Erusso87 over 8 years
      @ThomasLevesque You are right, two objects with the same hash code are not necessarily equal. But still GetHashCode() is used in very many implementations of Equals(). That's what I meant with that statement. GetHashCode() inside Equals() is often used as a shortcut to determine inequality, because if two objects have a different hash code they have to be objects that are not equal and the rest of the equality-check doesn't have to executed.
    • NotEnoughData
      NotEnoughData about 7 years
      @bitbonk Usually, both GetHashCode() and Equals() need to look at all fields of both objects (Equals has to do this if it the hashcodes are equal or not-checked). Because of this, a call to GetHashCode() inside Equals() is often redundant and could reduce performance. Equals() may also be able to short circuit, making it much faster - however in some cases the hashcodes may be cached, making the GetHashCode() check faster and so worthwhile. See this question for more.
    • Rick Davin
      Rick Davin over 4 years
      UPDATE JAN 2020: Eric Lippert's blog located at: docs.microsoft.com/en-us/archive/blogs/ericlippert/…
    • zumalifeguard
      zumalifeguard about 4 years
      UPDATE MAR 2020: The link from @RickDavin is correct, but the the article at docs.microsoft.com has bad formatting. Here's the same article on Eric's blog. ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashc‌​ode
    • Ε Г И І И О
      Ε Г И І И О about 4 years
      Now you can just use HashCode.Combine(field1, field2,...)
  • Erusso87
    Erusso87 over 15 years
    The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ?
  • Jon Skeet
    Jon Skeet over 15 years
    Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me - time to add a link to the book...
  • pero
    pero about 14 years
    That means that if you have objects Person and Account and they both have and ID = 1, they will have the same hash code. And that is not ok.
  • piers7
    piers7 about 14 years
    Actually the comment above is incorrect. There will always be the possibility of hash-code collisions (a hash code only locates the bucket, not the individual object). So such an implementation - for a hashcode containing mixed objects - would lead to a lot of collisions, which is undesirable, but it would be absolutely fine if you only ever had objects of a single type in your hashtables. Also it doesn't distribute evenly, however neither does the base implementation on system.object, so I wouldn't worry about it too much...
  • nightcoder
    nightcoder about 14 years
    Well, it will cause boxing, if fields are value types.
  • sleske
    sleske about 14 years
    "In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many." This is dangerous advice, because for objects which only differ in the un-hashed fields, you will get hash collisions. If this happens frequently, the performance of hash-based collections (HashMap, HashSet etc.) will degrade (up to O(n) in the worst case).
  • sleske
    sleske about 14 years
    This actually happened in Java: In early versions of the JDK String.hashCode() only considered the beginning of the string; this lead to performance problems if you used Strings as keys in HashMaps which only differed at the end (which is common e.g. for URLs). The algorithm was therefore changed ( in JDK 1.2 or 1.3 I believe).
  • Bert Huijben
    Bert Huijben about 14 years
    If that one field 'provides a good distribution' (last part of my answer), then one field is enough.. If it doesn't provide a good distribution, then (and just then) you need another calculation. (E.g. just use another field that does provide a good distribution, or use multiple fields)
  • Michael Stum
    Michael Stum over 13 years
    How are the Keys determined? GetHashCode() doesn't take any parameters, so it needs to call this one with two Keys that need to be determined somehow. Sorry, without further explanation this only looks clever, but not that good.
  • gehho
    gehho over 13 years
    And why do you need the generic overloads? The type is not important (and not used in your code) since all objects have a GetHashCode() method, so you can always use the method with the params array parameter. Or am I missing something here?
  • Magnus
    Magnus over 13 years
    It's about performance, avoid the loop for smaller <= 4 fields. But I guess the generics could be skipped and just use object instead.
  • CodesInChaos
    CodesInChaos over 13 years
    When you'd use object instead of generics you'd get boxing and memory allocations, which you don't want in GetHashCode. So generics are the way to go.
  • CodesInChaos
    CodesInChaos over 13 years
    23 is no good choice, since(as of .net 3.5 SP1) Dictionary<TKey,TValue> assumes good distribution modulo certain primes. And 23 is one of them. So if you have a dictionary with Capacity 23 only the last contribution to GetHashCode influences the compound hashcode. So I'd rather use 29 instead of 23.
  • CodesInChaos
    CodesInChaos over 13 years
    @Ani: Your implementation allocated several new objects on the heap, so the performance might is lower than with a manual implementation. Whether that's acceptable depends or your type and usage. Check some of the other answers for helpers using generics which avoid this problem.
  • Jon Skeet
    Jon Skeet over 13 years
    @CodeInChaos: Only the last contribution influences the bucket - so it might, at worst, have to look through all 23 entries in the dictionary. It's still going to check the actual hash code of each entry, which will be cheap. If you've got a dictionary that small, it's unlikely to matter much.
  • digEmAll
    digEmAll over 13 years
    Yes, anonymous GetHashCode implementation is very effective (BTW it's the same as the one in the Jon Skeet's answer), but the only problem with this solution is that you generate a new instance at any GetHashCode call. It can be a bit overhead-ish in particular in case of intensive access to big hashed collections...
  • Kumba
    Kumba over 13 years
    This works in VB w/ .NET 4.0, but looking over the IL, it's using box calls since the type uses generics. No unboxing, but from my reading here, the mere presence of boxing suggests this can be a little inefficient. Seems like the only choice for VB, though, since there is not equivalent of checked/`unchecked'.
  • Kumba
    Kumba over 13 years
    @Jon: I have to ask, despite already having opened my own question on the topic, but what's a good VB version of this since VB lacks the checked and unchecked keywords? I tried making tmpHash an Int64 and AND'ing the lower 8 bits (per the accepted answer to my question), but on a sufficiently-large enough set of fields, it somehow caused the calculation to wrap to 0 for the remainder of the loop.
  • Jon Skeet
    Jon Skeet over 13 years
    @Kumba: I don't know how I'd do this in VB, I'm afraid. Is arithmetic always checked in VB? Could you have a separate class library which you could delegate the arithmetic to, either written in C# or with checked arithmetic turned off project-wide?
  • Kumba
    Kumba over 13 years
    @Jon: VB apparently checks a lot of things. It's got a fetish for demanding that unsigned numerics get converted into signed numerics before letting you left or right shift them. Which is driving me up the wall and across the ceiling. I'm trying to implement the Jenkins hash to work around the lack of checked/unchecked (a rotating hash also does the trick, but I'm worried about hash collisions with input). Using a separate, C# library is something I'd like to avoid, because it essentially admits defeat. If I get to that point, I should just re-write the entire project in C#.
  • pomeroy
    pomeroy over 13 years
    isn't the 'unchecked' unnecessary b/c CLR will happily overflow by default?
  • Jon Skeet
    Jon Skeet over 13 years
    @pomeroy: It depends on what the project settings are. Basically you give the assembly a default context of checked or unchecked.
  • Kumba
    Kumba over 13 years
    @pomeroy: VB isn't as granular as C# is. Because it lacks the above-two keywords, your only options are to remove nteger overflows for the entire project or not. I guess if your project is complete and generally well tested, removing the overflow checks is a safe thing to do. However, while building it and debugging, those checks are good because they'll highlight bugs to fix. I opened Connect Ticket # 636564 with Microsoft to recommend including checked/unchecked keyword support in the next .NET release. Uncertain if they'll do so, though.
  • Kumba
    Kumba over 13 years
    I'll add that I'll have to go with the rotating hash algorithm linked to in Jon's answer above. It doesn't overflow, even on an Int32, doesn't (so far) wrap to 0 on a large number of fields in the calculation, and is simple and rather speedy. The Jenkins hash didn't work out...Even that overflows at random, depending on the input. Also, the forcing of bitshifts happening in signed math get in the way of a lot of things. I might open another bug on that, unless that's intended behavior somehow.
  • irag10
    irag10 about 13 years
    Don't you need override in your method declaration? Would also be good to put in null checks since this is such a well-used example.
  • Jon Skeet
    Jon Skeet about 13 years
    @Rory: I've added the override, thanks - I'm not going to put in the null checks, as I feel that would obscure the important points. The comment suffices IMO.
  • fredoverflow
    fredoverflow about 13 years
    Why start with a prime instead of just zero? does int hash = 17; have any theoretically supported benefits?
  • Jon Skeet
    Jon Skeet about 13 years
    @FredOverflow: I don't know the exact details of all the reasons behind it, but starting with 0 would mean that the hash would stay as zero if the individual field hashes were zero... and that's probably not uncommon (e.g. an integer of value zero will probably hash to zero). Just a guess, but I suspect that having a constant which gets propagated with each field is useful. This is really just copied from Effective Java though :)
  • Erusso87
    Erusso87 about 13 years
    @JonSkeet How collision safe would this algortihm be for an complex object graph consisting of let's say 500 objects with each having 10 properties. Related question: stackoverflow.com/questions/5308057/…
  • Jon Skeet
    Jon Skeet about 13 years
    @bitbonk: The chances of a collision on any single change would be pretty low... but in the question you're talking about I'd probably use a cryptographic hash instead.
  • Erusso87
    Erusso87 about 13 years
    Then the questions stands, how do I create a cryptographic hash on an object model?
  • Jon Skeet
    Jon Skeet about 13 years
    @bitbonk: I would strongly consider using a "normal" cryptographic hash on the result of binary serialization of form.
  • Rick Love
    Rick Love about 13 years
    @digEmAll Good point, I didn't think about the overhead of creating an new object. Jon Skeet's answer is the most efficient and won't use boxing. (@Kumba To solve the unchecked in VB, just use a Int64 (long) and truncate it after the calculations.)
  • sehe
    sehe about 13 years
    The trailing shift/xor steps (h += (h << 10); h ^= (h >> 6); h += (h << 3); h ^= (h >> 11); h += (h << 15); have a codesmell: they do not depend on any of the input and look awfully redundant to me.
  • yoyo
    yoyo over 12 years
    This algorithm is basically the DJB2 string hashing algorithm, for which the constants 5381 and 33 are recommended (cse.yorku.ca/~oz/hash.html). To be honest I'm not sure the constant makes much difference, but the multiplier is important.
  • KChaloux
    KChaloux over 11 years
    @JonSkeet I realize I'm raising the dead here, but implementing hashes is a new thing for me. In your implementation, what fields am I including in the hash? Only immutable ones, or are any fields good?
  • Jon Skeet
    Jon Skeet over 11 years
    @KChaloux: That entirely depends on what you want equality to mean. Normally it's a bad idea to include mutable data though.
  • Darrel Lee
    Darrel Lee over 11 years
    The hash code can just be the id, since the id is an integer. There's no need to call GetHashCode on an integer (it's an identity function)
  • Magnus
    Magnus over 11 years
    @nawfal what speed considerations do you have?
  • nawfal
    nawfal over 11 years
    @Magnus nothing in particular other than the general rule that hashing must be fast. This can't be as fast as I would love it to be. But as I said this gives better distribution of values which may be suitable for some cases.
  • Magnus
    Magnus over 11 years
    @nawfal Running this 100 million times takes about 390 ms. Running the solution suggested by Jon Skeet 100 million times takes about 320 ms so it is not a huge difference.
  • nawfal
    nawfal over 11 years
    @Magnus yes right, I'll delete my original comment. Just a little note that this may not be as fast as some other solutions here, but as you say shouldn't matter. The distribution is great, better than most solutions here, so +1 from me! :)
  • Vajda
    Vajda over 11 years
    How would you handle nullity? If simply ignoring that field, then for A = null, B = "ss" and for A = "ss", B = null we would have colisions. Is't better to multiply each field with different prime number?
  • Jon Skeet
    Jon Skeet over 11 years
    @Vajda: I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.
  • nawfal
    nawfal about 11 years
    No need for T[] separately as it is already IEnumerable<T>
  • nawfal
    nawfal about 11 years
    you can avoid the object creation inside gethashcode function as in Mangus's answer. Just call the damn static hash functions (who cares about starter hash). Also, you could use AddItems<T>(params T[] items) method more often in the helper class (than calling AddItem(T) each time).
  • nawfal
    nawfal about 11 years
    And what benefit do you find doing this.result * Prime2 * item.GetHashCode() when often used is this.result * Prime2 + item.GetHashCode()?
  • nawfal
    nawfal about 11 years
    @DarrelLee but tomo his _id could be a Guid. It's a good coding practice to do _id.GetHashCode as the intent is clear.
  • nawfal
    nawfal about 11 years
    You could refactor those methods and restrict the core logic to one function
  • Erusso87
    Erusso87 about 11 years
    I can't use AddItems<T>(params T[] items) more often because typeof(T1) != typeof(T2) etc.
  • mwolfe02
    mwolfe02 about 11 years
    In VB.Net: New With {PropA, PropB, PropC, PropD}.GetHashCode()
  • Trident D'Gao
    Trident D'Gao almost 11 years
    @DarrelLee, it is a not a good option because sequential IDs from the database don't provide a good distribution
  • Chui Tey
    Chui Tey over 10 years
    Incidentally, 31 is a shift and subtract on the CPU, which is exceedingly fast.
  • supercat
    supercat over 10 years
    I don't think there's a problem with having GetHashCode perform memory allocations, provided that it only does so the first time it's used (with subsequent invocations simply returning a cached result). The important thing is not that one should go to great lengths to avoid collisions, but rather that one should avoid "systemic" collisions. If a type has two int fields oldX and newX which frequently differ by one, a hash value of oldX^newX would assign 90% of such records hash values of 1, 2, 4, or 8. Using oldX+newX [unchecked arithmetic] might generate more collisions...
  • supercat
    supercat over 10 years
    ...than would more sophisticated function, but a collection of 1,000,000 things that have 500,000 different hash values will very well if each hash value has two associated things, and very badly if one hash value has 500,001 things and the others have one each.
  • Jon Skeet
    Jon Skeet over 10 years
    @jnm2: I don't follow your argument, to be honest. In particular, I've just tried this effectively hashing 10 fields - and changing the value of just the first field still changed the hash, contradicting your claim that "every bit of the first hash codes will be lost".
  • Jon Hanna
    Jon Hanna over 10 years
    Xoring integers to make a hashcode is a well-known antipattern that tends to result in a particularly high number of collisions with real-world values.
  • Jon Hanna
    Jon Hanna over 10 years
    You can demonstrate this gives a poor distribution quite simply. Take this variant of FNV and apply it to strings (use unsafe pointer manipulation to get integers at a time to give it a fair chance). Use it to add strings to a power-of-two based hash table. With the one I'm working on now, if I generate "1", "2", ... "999999" and add them in, it takes about 34 seconds. Now take this same hash method and re-hash the result with a well-distributed hash. With a good hash this can only make things worse (more time spent, and we could introduce new collisions but never remove them). With the...
  • Jon Hanna
    Jon Hanna over 10 years
    ... same hash table I'm working on, the same code to generate "1"..."999999" and add them takes 1 second. The effect is less pronounced with prime-number based hashes, so it that case the extra time spent re-hashing (and perhaps the reduction in possible outcomes, though that's unlikely) doesn't gain anything, but the poor performance on power-of-two tables demonstrates poor distribution generally.
  • Jon Skeet
    Jon Skeet over 10 years
    @JonHanna: Thanks for that. Not sure what you mean by "to get integers at a time" but I'll try to have a closer look. I still like this as a first approximation for a hash, but if you have another hash which is as simple to remember and get right but with a better distribution, I'd be very happy to change my practice :)
  • Jon Hanna
    Jon Hanna over 10 years
    I meant I used fixed(char* ptr = str){int* iPtr = (int*)ptr;... but I also tried just doing foreach(char c in str) and casting each char to int, and the same applies. The relative weakness became apparent to me when I had reason to use power-of-two tables and was getting poor results (I used to use much the same as above, myself). The solution I finally hit upon is to forget about being easy to remember, and to build a hard-to-remember method once, and then make it easy to use and put it the code at nuget.org/packages/SpookilySharp I'll add a full answer here at lunchtime.
  • Jon Hanna
    Jon Hanna over 10 years
    @JonSkeet and now answered.
  • Jon Skeet
    Jon Skeet over 10 years
    @JonHanna: Thanks for that. Will have to look in more detail when I have a bunch of time :)
  • Jon Hanna
    Jon Hanna over 10 years
    @1224 depending on use patterns it can be horrible for the reason you give, but it can also be great; if you've a sequence of such numbers with no holes, then you've a perfect hash, better than any algorithm can produce. If you know that's the case you can even count on it and skip the equality check.
  • Dzyann
    Dzyann about 10 years
    I think is important to point out that we should be careful with the Hash code changing at run-time. We had a bug in my project because the previous developer had implemented a GetHashCode algorithm based in this answer. But in his implementation he had a list of objects, he used the hash of each item in the collection to generate the object hash code. Therefore, when the collection changed, the hash code changed also. That was causing binding problems in WPF. And if you had the object in a dictionary for example, you would get errors also.
  • Jon Skeet
    Jon Skeet about 10 years
    @Dzyann: Yes, mutating the key in a way which affects equality - and thus the hash code - it always a bad idea. I'll add a note.
  • Dzyann
    Dzyann about 10 years
    @JonSkeet you are right, and it can lead to very hard to track bugs. Like in this case with the bindings of WPF. It took ages until one of my coworkers found the cause of it and solved it. Since It wasn't our code it was very challenging.
  • Tim Schmelter
    Tim Schmelter about 10 years
    "can be enhanced later by catching the OverflowException" The whole point of the unchecked is to avoid exceptions on overflow which is desired on GetHashCode. So it's not incorrect if the value overflows int and it does not hurt at all.
  • jnm2
    jnm2 about 10 years
    I would suggest that you change 17 and 23 to the constants here. (Thanks for the link.) It gave a simple dictionary lookup much higher performance, in my case, ~60% better.
  • Jon Skeet
    Jon Skeet about 10 years
    @jnm2: That's not the same algorithm to start with - it's using XOR rather than ADD. I'll stick with these constants for this answer, but perhaps you should add your own answer?
  • jnm2
    jnm2 about 10 years
    Actually, I was going to suggest that xoring rather than adding wouldn't diminish the simplicity as a go-to hash algorithm. What do you think?
  • jnm2
    jnm2 about 10 years
    XOR ends up making GetHashCode() faster by 12% in my case.
  • Jon Skeet
    Jon Skeet about 10 years
    @jnm2: Well it wouldn't diminish that simplicity - but it's not what I've done for the past several years. I'll add FNV as an alternative though.
  • user1798101
    user1798101 about 10 years
    int hash = 2166136261; Is there a cast missing? Compiler says that 2166136261 is a uint... I changed it to int hash = (int)2166136261;
  • supercat
    supercat about 10 years
    When combining multiple hash values into one, I tend to use long values for the intermediate results, and then munge the final result down to an int. Does that seem like a good idea? My concern is that one uses e.g. hash=(hash*31)+nextField, then pairs of matching values will only affect the upper 27 bits of the hash. Letting the calculation extend to a long and wrapping stuff in would minimize that danger.
  • Jon Hanna
    Jon Hanna about 10 years
    @supercat it depends on the distribution of your final munging. The SpookilySharp library would ensure that the distribution was good, ideally (because it won't need object creation) by passing a pointer to a blittable type, or passing one of the enumerables it handles directly, but if you don't already have blittable data or a suitable enumeration, then calling .Update() with the multiple values as per the answer above will do the trick.
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    I tried to implement this approach for ValueUtils, but in my testing this variant of FNV caused considerable collisions (24%) in certain symmetrical datasets. And perhaps that's because this is NOT actually the FNV hash? Tradutional FNV hashes per octet (byte), not per 32-bit word. That gives this variant less opportunity to mix this bits around...
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @JonHanna would you be willing to be more precise with the problematic behavior you encountered? I'm trying to implement a library that makes implementing value objects trivial (ValueUtils) and I'd love a testset demonstrating poor hash miscibility in power-of-two hashtables.
  • Jon Skeet
    Jon Skeet almost 10 years
    @EamonNerbonne: What do you mean by "this approach"? The answer now contains two different versions...
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    I mean this variant of FNV - it's not quite FNV, and I'm pretty sure that makes matters worse. Incidentally, I also tried the h=prime; repeat h=h*prime + ? recipe; that seems to vary; it does quite well for larger primes, especially if your intermediate is 64-bit wide.
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    An extension method in int is nasty namespace pollution - @safak-gur 's answer below nicely side-steps that problem.
  • Jon Skeet
    Jon Skeet almost 10 years
    @Eamon: I don't know enough about the theory to comment further, I'm afraid :(
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    Yeah, the theory behind it is not at all obvious to me. However - this answer suggests that this implementation is FNV, a well-known good hash. But that's not really true, since this is not FNV. Also, FNV is a string hash algorithm, which needs to satisfy much trickier requirements in that it needs to work for variable length potentially long strings. But again, the algorithm currently in the answer is not FNV - it mixes the bits much less well.
  • Jon Skeet
    Jon Skeet almost 10 years
    @EamonNerbonne: Okay. I'll edit to indicate that it's a modification, and that it doesn't work as well in at least some cases.
  • Jon Hanna
    Jon Hanna almost 10 years
    @EamonNerbonne I don't really have anything more precise than "overall time was slower that way". As I added in an edit, the fact that I was using open-addressing may have been more important than the power-of-two factor. I do plan to do some test cases on a particular project where I'll be comparing a few different approaches, so I may have a better answer for you after that, though that's not a high-priority (a personal project with no pressing need, so I'll get to it when I get to it...)
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @JonHanna: yeah I know how the personal project schedule goes - good luck! In any case, I see I didn't phrase that last comment well: I meant to ask for the problematic input, and not necessarily the details of the problems that resulted. I'd love to use that as a test set (or inspiration for a test set). In any case - good luck with your pet project :-).
  • jnm2
    jnm2 almost 10 years
    @EamonNerbonne: What are the best coefficients that you know of?
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @jnm2 In my experiments, the offset matters little, and the trend is that larger primes perform better, with the caveat that this is all tricky to test because it's slow (really slow) to be thorough, and it depends on the way in which your dataset is "messed up". If your fields have perfectly randomly distributed hashcodes - none of this matters, but of course, in the real world, those hash codes aren't random, and fields are correlated. There's a fairly good reason why large primes would be better too - they mix bits better, especially if your data consists largely of small numbers.
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @jnm2, so I'd pick a largish prime (on the order of 2^16, say) and to tune for .NET's dictionary implementation, one that's NOT used by Dictionary<,>: referencesource.microsoft.com/#mscorlib/system/collections/…
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @jnm2 I came across these two questions further exploring this issue: stackoverflow.com/questions/1835976/… and stackoverflow.com/questions/1145217/… and both conclude: use any old large prime. The accepted answer in the first question mentions two chosen in a principled way - but it's not likely a principle that really relates to the real world, so it still recommends the basic idea: pick a large prime, NOT 23 or 31.
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    BTW: note that the offset is (as far as I can tell) completely pointless. Distributive laws also hold under modulo, and that means it's simply an identical offset all objects will share - that certainly has no impact on any hashtable I I know.
  • Jon Skeet
    Jon Skeet almost 10 years
    @EamonNerbonne: I guess that's true if all the objects are of the same type. If you've got a dictionary where some keys are subclasses of other keys, that could make a difference... although only when the extra field values are 0 anyway. Again, this is mostly just habit for me :(
  • Eamon Nerbonne
    Eamon Nerbonne almost 10 years
    @JonSkeet Yeah, if you have objects of differing type and you use differing offsets, you'll have some advantage. No reason to be prime, though, I think... In any case, an addition is so cheap, there's not much reason to avoid it either.
  • David Osborne
    David Osborne over 9 years
    VB.NET must use Key in anonymous type creation: New With {Key PropA}.GetHashCode() Otherwise GetHashCode will not return the same hashcode for different objects with the same 'identifying' properties.
  • Bill Barry
    Bill Barry over 9 years
    I would change the line with the tertiary operator to be: var h = Equals(obj, default(T)) ? 0 : obj.GetHashCode();
  • Martin Liversage
    Martin Liversage over 9 years
    I believe that the ternary operator with obj != null will compile to a box instruction which will allocate memory if T is a value type. Instead you can use obj.Equals(null) which will compile to a virtual call of the Equals method.
  • Mark Hurd
    Mark Hurd over 9 years
    In case you intended otherwise unchecked does NOT affect Convert.ToInt32: uint, long, float, double and decimal can all overflow here.
  • Max Yankov
    Max Yankov over 9 years
    I used this algorithm for a pseudo-random generator, and it behaves a little strangely: stackoverflow.com/questions/26847262/…
  • ANeves
    ANeves about 9 years
    @nightcoder you could use params.
  • Hans-Peter Störr
    Hans-Peter Störr about 9 years
    If you got the number 486187739 from stackoverflow.com/a/2816747/21499 - I actually intended to recommend 92821.
  • Nathan Adams
    Nathan Adams about 9 years
    One issue with this algorithm is that any array full of nulls will always return 0, regardless of it's length
  • Pharap
    Pharap almost 9 years
    @ChuiTey This is something all Mersenne Primes have in common.
  • Şafak Gür
    Şafak Gür almost 9 years
    Because this.hashCode != h. It wouldn't return the same value.
  • Erik Karlsson
    Erik Karlsson almost 9 years
    Sorry, manage to remove my comment instead of editing it. Is it more beneficial to create a new struct then change the hashCode to non-readonly and do: "unchecked { this.hashCode ^= h * 397; } return this;" for example?
  • Şafak Gür
    Şafak Gür almost 9 years
    Immutability has its benefits (Why are mutable structs evil?). About performance, what I do is pretty cheap since it does not allocate any space in the heap.
  • Ahmad Siavosh
    Ahmad Siavosh over 8 years
    As each instance of class 'object' has a unique hash code, this idea came to my mind that it would be good if we use base.GetHashCode() as a seed or something to produce our hash code for the object.
  • Jon Skeet
    Jon Skeet over 8 years
    @AhmadSiavosh: No, that's a bad idea, because you want distinct but equal objects to have the same hash code. (I don't think object.GetHashCode is guaranteed to be unique, either. It may well be "very unlikely to collide" but that's not the same thing.)
  • Hassan Faghihi
    Hassan Faghihi over 8 years
    Every one here use integer, and there been never any kind of guarantee for hash to be same, it just tried to be as much vary as there are few collisions to happen.
  • Jon Hanna
    Jon Hanna over 8 years
    Yes, but your second and fifth don't try to avoid collisions.
  • Hassan Faghihi
    Hassan Faghihi over 8 years
    Not sure which thread it was... but it did the same, msdn.microsoft.com/en-us/library/…
  • Jon Hanna
    Jon Hanna over 8 years
    Yes, that antipattern is quite common.
  • Hassan Faghihi
    Hassan Faghihi over 8 years
    that's why i relay on that, but thanks for lighten things up... the other thing is does the other patter has less calculation time? you know, kind of, collision vs calculation time matter is also there
  • Jon Hanna
    Jon Hanna over 8 years
    There's a balance to reach. Use a really good hash code like Spookyhash and you'll get much, much better collision avoidance but it will have much more calculation time than any of these (but when it comes to hashing very large amounts of data, Spookyhash is extremely quick). A simple shift on one of the values before xoring is only marginal extra cost for a good reduction in collision. Prime-number multiplication increasing both time and quality again. Which is better between shift or mult is hence debatable. Plain xor though very often has a lot of collisions on real data and is best avoided
  • Keith
    Keith over 8 years
    Don't forget to enumerate your IEnumerables or bad things will happen. new { PropA, PropB, C = PropC.ToList() }.GetHashCode()
  • Rick Love
    Rick Love over 8 years
    @Keith in that case, I would consider saving the IEnumerable as a list value somewhere instead of enumerating it each time the hashcode is calculated. Caclulating ToList each time inside GetHashCode could hurt performance in many situations.
  • Jaider
    Jaider about 8 years
    If a fieldLis a List<obj>, it will work by just doing hash = ... ^ fieldL.GetHashCode() or I have to go through the items like foreach(){hash = ... ^ item.GetHashCode()}???
  • Jon Skeet
    Jon Skeet about 8 years
    @Jaider: It won't do either. List<T> doesn't override Equals or GetHashCode.#
  • dmarra
    dmarra about 8 years
    I tried this code for 3 doubles, and I get an enormous amount of collisions. I need to get hashcodes for 4194304 tuples. Is there a better way? Using some larger primes helped a little bit, but I am still getting collisions.
  • Jon Skeet
    Jon Skeet about 8 years
    @user984444: Well you should expect quite a few collisions with that many entries. Just how many are you getting?
  • dmarra
    dmarra about 8 years
    @JonSkeet It's hard to say. I am using this to cache the output of some perlin noise, and the indicator of collision is some "intersting" output in my image; It has the appearance of... when you win solitaire. This gets mitigated (and the patern changes) with larger primes. Very unhelpful I know. I have changed my struct (tuple of doubles as the key) into a class so that net takes care of the hashcode for me, and it has no collisions anymore.
  • Jon Skeet
    Jon Skeet about 8 years
    @user984444: Um, that way equal keys won't be equal though, unless you've overridden GetHashCode in your class, in which case you've got the same problem. Maybe it would be worth you asking a new question with all the details...
  • dmarra
    dmarra about 8 years
    @JonSkeet: Not true; the default implementation of GetHashCode is working perfectly well (If it wasn't, it would be incredibly obvious in my end result). It works for a struct as well, but is WOEFULLY SLOW. I wanted to use structs, but using a class seems to work just fine for my use case.
  • Jon Skeet
    Jon Skeet about 8 years
    @user984444: Unless you override GetHashCode and Equals yourself, or inherit from another class which does so, you will get reference equality. That's not what the struct will give you. It really, really sounds like we need a new post here with details.
  • dmarra
    dmarra about 8 years
    @JonSkeet: I consider my particular problem solved because I am getting the desired result, but if I get a chance, I'll post a question with details so you can see what is going on.
  • MikeW
    MikeW about 8 years
    being really picky, StyleCop default settings generates a warning for this code (SA1407) as you have not used parentheses to determine the precedence of the arithmetic operators, even though it is clean to any dev reading the code, and the compiler as we all know the BODMAS rule.
  • Jon Skeet
    Jon Skeet about 8 years
    @MikeW: I don't think BODMAS includes XOR :) I think the final piece of code would be clearer with parentheses - will add them now. I agree that the multiply-and-add version doesn't need them.
  • user764754
    user764754 about 8 years
    There is no boxing if you call it like Hash(1) and not like Hash<MyInterface>(myStruct). stackoverflow.com/questions/8823239
  • James Newton-King
    James Newton-King almost 8 years
    This helper method also allocates a new object[]
  • David Schwartz
    David Schwartz over 7 years
    As @NathanAdams mentions, the fact that null is skipped entirely could give you unexpected results. Instead of skipping them, you should just use some constant value instead of input[i].GetHashCode() when input[i] is null.
  • JJS
    JJS over 7 years
    Doesn't handle nulls.
  • shA.t
    shA.t over 6 years
    And don't forget that private property/fields are not needed in this case ;).
  • maaartinus
    maaartinus over 6 years
    I'd bet that your ReHash is a big overkill. I guess, it works well, but it may be even slower than cryptographic hashed which are (sort of) proven to work perfectly. Java also uses power-of-two sized tables and there used to be a rather complicated re-hashing. It's got simplified since tree nodes for collisions were introduced.
  • Jon Hanna
    Jon Hanna over 6 years
    @maaartinus in terms of speed and distribution, it's well demonstrated. I'm now of the opinion that it's more trouble than it's worth for small values. I'd still use the fuller implementation of SpookyHash when it comes to hashing very large values like file contents.
  • RobIII
    RobIII over 6 years
    For future readers: Consider using HashCode.Combine()
  • Dan J
    Dan J over 6 years
    That looks like a sweet addition... any way to know what version of .NET Core that'll ship in?
  • James Ko
    James Ko over 6 years
    @DanJ What a happy coincidence, the HashCode changes for corefx were merged just a couple of hours before your comment :) The type is slated to ship in .NET Core 2.1.
  • Dan J
    Dan J over 6 years
    That is awesome - and quite the turnaround time. Upvoted. :)
  • James Ko
    James Ko over 6 years
    @DanJ Even better news-- it should be available right now on the nightly builds of CoreFX hosted on the dotnet-core MyGet feed.
  • Dan J
    Dan J over 6 years
    Sweet - that doesn't help me at work, since we're not quite that bleeding-edge, but good to know. Cheers!
  • ToolmakerSteve
    ToolmakerSteve about 6 years
    How does this compare in quality (distribution) and performance to simply using a long intermediate, with multiplication of each input by a large prime? E.g. for two values, something like this one liner: return ((long)v1 * 805306457 + (long)v2 * 189783887).GetHashCode(); [The primes are chosen to avoid numeric overflow of long, in a checked environment, and to tend to set different bits.]
  • ToolmakerSteve
    ToolmakerSteve about 6 years
    @Keith: A hash code does not have to be affected by all properties of an object. A hash code merely needs to give good enough distribution of your objects. And should be fast to compute. Leave out the enumerable. And if you have a list, don't include the whole list. Use Count, and perhaps first element (use zero if no elements). unless your class doesn't have much variation other than the list; in that case, caching a hash of the list, as Rick suggests, is best. Recall that, by definition, an object's hash must always be the same. If collection changes, DO NOT include it in hash calc.
  • BaltoStar
    BaltoStar about 6 years
    @JonSkeet any idea how to go about this in t-sql ? i need c# hash of guid series to match t-sql hash of uniqueidentifier series. but afaik in t-sql it's not possible to wrap results of integer-arithmetic.
  • Jon Skeet
    Jon Skeet about 6 years
    @BaltoStar: I don't know anything about hashing in T-SQL. If it already provides well-defined hashing for GUID values, I'd probably try to emulate that in C# rather than the other way round.
  • BaltoStar
    BaltoStar about 6 years
    @JonSkeet in C# why not just MD5 hash the ordered concatenation of the GUIDs ?
  • Jon Skeet
    Jon Skeet about 6 years
    @JamesKo: I'll add a link to HashCode.Combine when .NET Core 2.1 has actually been released, and I can link to docs. I don't think it'll be useful to many people until that time.
  • James Ko
    James Ko about 6 years
    @JonSkeet Sure.
  • crush
    crush almost 6 years
    I'm unsure how to handle the nulls here. None of the answers really seem to touch that topic, just assuming we are all experts on the topic. @JonSkeet Mentions in these comments "I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field." How that is actually implemented though, I have questions about. It sounds like you are saying that a null property should zero out the current hash value, but that seems like odd behavior. It might be obvious to some what to do, but I'd appreciate an example showing how to handle the nulls, or a better explanation.
  • crush
    crush almost 6 years
    After reading a few other Q&A's on the topic, I realized that I didn't comprehend what @JonSkeet was saying very well. I misunderstood him to be saying that I should substitute 0 as the hashing constant when the property is null. After seeing an example here, I realize he was merely stating that I should substitute 0 as the property's hash code, which seems so obvious now...considering that is exactly what he said.
  • dragonfly02
    dragonfly02 almost 6 years
    Does it really need to use primes like 17 or 23 if my object's hash only depends on one int32 property? Can I just return MyProperty.GetHashCode()?
  • Jon Skeet
    Jon Skeet almost 6 years
    @stt106: For just a single property, I'd just return that property's hash code, yes.
  • geekley
    geekley almost 6 years
    shouldn't the hash variable start with a non-zero? stackoverflow.com/a/113600/9638388
  • cactuaroid
    cactuaroid over 5 years
    For those who like this, (PropA, PropB, PropC, PropD).GetHashCode() is now available on C#7 without GC pressure @digEmAll concerns. Quick and Simple Hash Code Combinations
  • Rick Love
    Rick Love over 5 years
    @cactuaroid Nice! So using a tuple (which is a struct) instead of the anonymous type (a class). Does it still use the same calculation behind the scenes for the Tuple GetHashcode()?
  • cactuaroid
    cactuaroid over 5 years
    @RickLove I'm not sure the mathematics. Tuple.GetHashCode() and ValueTuple.GetHashCode() looks similar. ValueTuple.GetHashCode() calls HashHelper. Tuple.GetHashCode() calls Tuple.CombineHashCodes. For anonymous type, How are Equals and GetHashCode implemented on anonymous types?
  • digEmAll
    digEmAll over 5 years
    @cactuaroid: indeed this is a great solution !
  • cactuaroid
    cactuaroid over 5 years
    I apologize that @Timo has already posted about ValueTuple.GetHashCode() below.
  • cactuaroid
    cactuaroid over 5 years
    Assuming h1 >> 27 is 0 to ignore it, h1 << 5 equals h1 * 32 therefore it is same as h1 * 33 ^ h2. According to this page, it is called "Modified Bernstein".
  • cactuaroid
    cactuaroid over 5 years
    FYI, Visual Studio 2017 can generate GetHashCode() without ReSharper. docs.microsoft.com/en-us/visualstudio/ide/reference/…
  • KnorxThieus
    KnorxThieus about 5 years
    Just because it's cool, you can also do this with an one-liner: source?.Aggregate(0, (current, item) => unchecked(current * 31 + (item?.GetHashCode() ?? 0))) ?? 0;
  • ogggre
    ogggre about 5 years
    Here is a drop-in polyfill package that you can use for everything .NET 4.0+ (System.HashCode included): nuget.org/packages/Gapotchenko.FX
  • Steven Coco
    Steven Coco almost 5 years
    Yipes: I found a bug! The HashKeysAndValues method has been fixed: it invokes HashKeyAndValue.
  • emery.noel
    emery.noel over 4 years
    Why multiply hash on every line? Why: int hash = 17; hash = hash * 23 + ... ? Why not just use the product explicitly, as in, hash = 391 + field1.GetHashCode(); ? Since order of operations will do the multiplication first anyway?
  • Jon Skeet
    Jon Skeet over 4 years
    @emery.noel: That won't make any difference after the first line (you'd still need to multiply, to include the previous hash) and there's a big benefit IMO in making every line consistent.
  • Tb.
    Tb. about 4 years
    Not much notice has been taken of an important point. It is Important that the returned hash code NOT CHANGE if the object is mutable and the object changes. This is because the hash code is used (for example) to place objects in dictionaries. If a mutable object changes after insertion in a dictionary then the object is not found when you go to look it up. The above should cache the hash the first time it is computed and always return the original value. Otherwise you will have strange bugs.
  • Jon Skeet
    Jon Skeet about 4 years
    @Tb.: Or you document it, as per the docs: "If you do choose to override GetHashCode() for a mutable reference type, your documentation should make it clear that users of your type should not modify object values while the object is stored in a hash table." Often that's a useful thing to do, as you might choose to construct an object but not mutate it afterwards. It's not "squeaky clean" but it can be perfectly practical.
  • nawfal
    nawfal almost 4 years
    @ANeves I suggest you dont use params if it is meant for wider use (like a public library). params involve an array allocation (plus the O(n) cost of populating the array) which is bad for perf sensitive situations. params object[] is doubly bad now that you introduce boxing cost as well for value types.
  • Timo
    Timo almost 4 years
    This is very good answer. As an addition, you could consider changing "speed" to "performance" and adding the property of being allocation-free. The built-in HashCode type satisfies that too.
  • user510101
    user510101 about 3 years
    How does this compare to the ValueTuple.GetHashCode() answer recently updated by @ricklove above?
  • Muhammad Rehan Saeed
    Muhammad Rehan Saeed about 3 years
    The HashCode.Combine is a static method which will not allocate anything, while ValueTuple will start with allocating on the stack.
  • Jalal
    Jalal about 3 years
    The link to the FNV article is broken but I found it in the archive: archive.vn/KJeJy
  • Amos Egel
    Amos Egel about 3 years
    HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers) - that is nice syntax :)
  • ScubaSteve
    ScubaSteve almost 3 years
    If nullable, do we need to check PropA, B, etc for null and if null pass in 0?
  • Pingpong
    Pingpong over 2 years
    @JonSkeet Which version of GetHashCode() is better, the one using 17/23 or the one using long numbers? Anyone can help out?
  • Jon Skeet
    Jon Skeet over 2 years
    @Pingpong: "Better" has any number of axes, and will depend on the source data. I don't even know which one you mean by "the one using long numbers" (there are likely to be several, and I'm not going to trawl over ever answer here finding candidates). I would personally just start with the simple one here - or use HashCode.Combine if you can.
  • KarmaEDV
    KarmaEDV over 2 years
    How would you do a nullcheck on an optional property here?
  • maraaaaaaaa
    maraaaaaaaa over 2 years
    they should never be used as key fields in a collection, Isn't that the whole point of hash codes though? And the existence of hash tables, hash sets, dictionaries?