How does c# figure out the hash code for an object?

18,468

Solution 1

It seems that I have a clue now.

I thought KeyValuePair is a reference type, but it is not, it is a struct. And so it uses ValueType.GetHashCode() method. MSDN for it says: "One or more fields of the derived type is used to calculate the return value".

If you will take a real reference type as a "tuple-provider" you'll cheat the dictionary (or yourself...).

using System.Collections.Generic;

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        class Pair<T, R>
        {
            public T First { get; set; }
            public R Second { get; set; }
        }

        static void Main(string[] args)
        {
            var dict = new Dictionary<Pair<int, MyClass>, object>();

            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 2) }, 1);

            //this is a pair of the same values as previous! but... no exception this time...
            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 3) }, 1);

            return;
        }
    }
}

Solution 2

Don't override GetHashcode() and Equals() on mutable classes, only override it on immutable classes or structures, else if you modify a object used as key the hash table won't function properly anymore (you won't be able to retrieve the value associated to the key after the key object was modified)

Also hash tables don't use hashcodes to identify objects they use the key objects themselfes as identifiers, it's not required that all keys that are used to add entries in a hash table return different hashcodes, but it is recommended that they do, else performance suffers greatly.

Solution 3

Here are the proper Hash and equality implementations for the Quad tuple (contains 4 tuple components inside). This code ensures proper usage of this specific tuple in HashSets and the dictionaries.

More on the subject (including the source code) here.

Note usage of the unchecked keyword (to avoid overflows) and throwing NullReferenceException if obj is null (as required by the base method)

public override bool Equals(object obj)
{
    if (ReferenceEquals(null, obj))
        throw new NullReferenceException("obj is null");
    if (ReferenceEquals(this, obj)) return true;
    if (obj.GetType() != typeof (Quad<T1, T2, T3, T4>)) return false;
    return Equals((Quad<T1, T2, T3, T4>) obj);
}

public bool Equals(Quad<T1, T2, T3, T4> obj)
{
    if (ReferenceEquals(null, obj)) return false;
    if (ReferenceEquals(this, obj)) return true;
    return Equals(obj.Item1, Item1)
        && Equals(obj.Item2, Item2)
            && Equals(obj.Item3, Item3)
                && Equals(obj.Item4, Item4);
}

public override int GetHashCode()
{
    unchecked
    {
        int result = Item1.GetHashCode();
        result = (result*397) ^ Item2.GetHashCode();
        result = (result*397) ^ Item3.GetHashCode();
        result = (result*397) ^ Item4.GetHashCode();
        return result;
    }
}
public static bool operator ==(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return Equals(left, right);
}


public static bool operator !=(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
    return !Equals(left, right);
}

Solution 4

Check out this post by Brad Abrams and also the comment by Brian Grunkemeyer for some more information on how object.GetHashCode works. Also, take a look at the first comment on Ayande's blog post. I don't know if the current releases of the Framework still follow these rules or if they have actually changed it like Brad implied.

Share:
18,468
Max Galkin
Author by

Max Galkin

Blog: http://yacoder.guru/blog/ Twitter: @yacoder My CV: http://careers.stackoverflow.com/yacoder

Updated on June 16, 2022

Comments

  • Max Galkin
    Max Galkin about 2 years

    This question comes out of the discussion on tuples.

    I started thinking about the hash code that a tuple should have. What if we will accept KeyValuePair class as a tuple? It doesn't override the GetHashCode() method, so probably it won't be aware of the hash codes of it's "children"... So, run-time will call Object.GetHashCode(), which is not aware of the real object structure.

    Then we can make two instances of some reference type, which are actually Equal, because of the overloaded GetHashCode() and Equals(). And use them as "children" in tuples to "cheat" the dictionary.

    But it doesn't work! Run-time somehow figures out the structure of our tuple and calls the overloaded GetHashCode of our class!

    How does it work? What's the analysis made by Object.GetHashCode()?

    Can it affect the performance in some bad scenario, when we use some complicated keys? (probably, impossible scenario... but still)

    Consider this code as an example:

    namespace csharp_tricks
    {
        class Program
        {
            class MyClass
            {
                int keyValue;
                int someInfo;
    
                public MyClass(int key, int info)
                {
                    keyValue = key;
                    someInfo = info;
                }
    
                public override bool Equals(object obj)
                {
                    MyClass other = obj as MyClass;
                    if (other == null) return false;
    
                    return keyValue.Equals(other.keyValue);
                }
    
                public override int GetHashCode()
                {
                    return keyValue.GetHashCode();
                }
            }
    
            static void Main(string[] args)
            {
                Dictionary<object, object> dict = new Dictionary<object, object>();
    
                dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 1), 1), 1);
    
                //here we get the exception -- an item with the same key was already added
                //but how did it figure out the hash code?
                dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 2), 1), 1); 
    
                return;
            }
        }
    }
    

    Update I think I've found an explanation for this as stated below in my answer. The main outcomes of it are:

    • Be careful with your keys and their hash codes :-)
    • For complicated dictionary keys you must override Equals() and GetHashCode() correctly.
  • Max Galkin
    Max Galkin almost 16 years
    Nice, but it's not an answer.
  • Cory R. King
    Cory R. King almost 16 years
    But how could they idenfity the object without generating a hash? Isn't that the point of the GetHashCode?
  • Max Galkin
    Max Galkin almost 16 years
    KeyValuePair doesn't implement GetHashCode. Your example is TOTALLY wrong. Open MSDN: If two objects compare as equal, the GetHashCode method for each object must return the same value.
  • Max Galkin
    Max Galkin almost 16 years
    These explanations contradict with the given code as well. Somehow run-time gets inside MyClass.GetHashCode() and uses it to obtain KeyValuePair hash code. But what exactly the run-time doing?
  • Pop Catalin
    Pop Catalin almost 16 years
    Cory, the hashcode of a key is a number used to quickly compute the location of a bucket whithin the hashtable, (a bucket is a key value pair, or more kv pairs), after the location is computed if the bucket contains the key (equality check)...
  • Pop Catalin
    Pop Catalin almost 16 years
    ... (if the key equals the key from that bucket) then the value is returned, else the position is rehashed and another location is probed, The algorithm ends when the bucket probed is empty or all buckets have been probed.
  • Pop Catalin
    Pop Catalin almost 16 years
    So, a hash code doesn't indicate the the value, but the fist position in hash table to be searched for the key.
  • Milan Gardian
    Milan Gardian about 15 years
    +1 for the correct implementation of GetHashCode, however I believe you shouldn't be throwing exception from Equals(object) - this implementation is also inconsistent: quad.Equals(null) throws NullReferenceException while quad.Equals((Quad)null) returns false :-).
  • Daniel A.A. Pelsmaeker
    Daniel A.A. Pelsmaeker over 11 years
    @PopCatalin I don't agree with your post. You can and should override GetHashCode and Equals, but should use only immutable fields for GetHashCode. And hash codes may not be used to uniquely identify objects, but they are used to greatly reduce the set of possible objects for which the hash table has to check equality.