What's the role of GetHashCode in the IEqualityComparer<T> in .NET?

36,946

Solution 1

A bit of background first...

Every object in .NET has an Equals method and a GetHashCode method.

The Equals method is used to compare one object with another object - to see if the two objects are equivalent.

The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.

A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.

When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.

When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.

Now to explain why both methods are necessary, consider this example:

BoxEqualityComparer boxEqC = new BoxEqualityComparer(); 

Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC); 

Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);

boxes.Add(redBox, "red"); 
boxes.Add(blueBox, "blue"); 

Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.

When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.

In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.

TLDR

So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.

Solution 2

GetHashCode is used in Dictionary colections and it creates hash for storing objects in it. Here is a nice article why and how to use IEqualtyComparer and GetHashCode http://dotnetperls.com/iequalitycomparer

Solution 3

While it would be possible for a Dictionary<TKey,TValue> to have its GetValue and similar methods call Equals on every single stored key to see whether it matches the one being sought, that would be very slow. Instead, like many hash-based collections, it relies upon GetHashCode to quickly exclude most non-matching values from consideration. If calling GetHashCode on an item being sought yields 42, and a collection has 53,917 items, but calling GetHashCode on 53,914 of the items yielded a value other than 42, then only 3 items will have to be compared to the ones being sought. The other 53,914 may safely be ignored.

The reason a GetHashCode is included in an IEqualityComparer<T> is to allow for the possibility that a dictionary's consumer might want to regard as equal objects that would normally not regard each other as equal. The most common example would be a caller that wants to use strings as keys but use case-insensitive comparisons. In order to make that work efficiently, the dictionary will need to have some form of hash function that will yield the same value for "Fox" and "FOX", but hopefully yield something else for "box" or "zebra". Since the GetHashCode method built into String doesn't work that way, the dictionary will need to get such a method from somewhere else, and IEqualityComparer<T> is the most logical place since the need for such a hash code would be very strongly associated with an Equals method that considers "Fox" and "FOX" identical to each other, but not to "box" or "zebra".

Share:
36,946

Related videos on Youtube

Lucian
Author by

Lucian

Updated on January 24, 2020

Comments

  • Lucian
    Lucian over 4 years

    I'm trying to understand the role of the GetHashCode method of the interface IEqualityComparer.

    The following example is taken from MSDN:

    using System;
    using System.Collections.Generic;
    class Example {
        static void Main() {
            try {
    
                BoxEqualityComparer boxEqC = new BoxEqualityComparer();
    
                Dictionary<Box, String> boxes = new Dictionary<Box,
                                                    string>(boxEqC);
    
                Box redBox = new Box(4, 3, 4);
                Box blueBox = new Box(4, 3, 4);
    
                boxes.Add(redBox, "red");
                boxes.Add(blueBox, "blue");
    
                Console.WriteLine(redBox.GetHashCode());
                Console.WriteLine(blueBox.GetHashCode());
            }
            catch (ArgumentException argEx) {
    
                Console.WriteLine(argEx.Message);
            }
        }
    }
    
    public class Box {
        public Box(int h, int l, int w) {
            this.Height = h;
            this.Length = l;
            this.Width = w;
        }
        public int Height { get; set; }
        public int Length { get; set; }
        public int Width { get; set; }
    }
    
    class BoxEqualityComparer : IEqualityComparer<Box> {
    
        public bool Equals(Box b1, Box b2) {
            if (b1.Height == b2.Height & b1.Length == b2.Length
                                & b1.Width == b2.Width) {
                return true;
            }
            else {
                return false;
            }
        }
    
        public int GetHashCode(Box bx) {
            int hCode = bx.Height ^ bx.Length ^ bx.Width;
            return hCode.GetHashCode();
        }
    }
    

    Shouldn't the Equals method implementation be enough to compare two Box objects? That is where we tell the framework the rule used to compare the objects. Why is the GetHashCode needed?

    Thanks.

    Lucian

  • Ash
    Ash over 13 years
    More: If you need to compare Equals would be enouf, but when you need to get element from Dictionary it's easier to do this by hash, not by using Equals.
  • R. Schreurs
    R. Schreurs about 11 years
    For those, wondering what the ^-operator is, this is the bitwise exclusive-OR operator, see msdn.microsoft.com/en-us/library/zkacc7k1.aspx.
  • sheikhjabootie
    sheikhjabootie about 11 years
    @R.Schreurs That's a really helpful addition. Many people could be unfamiliar with that. I've modified the response to include it. Sorry if it makes your comment look a bit out of place though.
  • Diego Frehner
    Diego Frehner almost 11 years
    Just to point this explicitly out: (msdn.microsoft.com/en-us/library/ms132155.aspx) Notes to Implementers Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
  • sheikhjabootie
    sheikhjabootie almost 11 years
    @DiegoFrehner - You're quite right. Another thing that can trip people up is that the value of the GetHashCode method should not vary if the object is modified. So the fields within the object that GetHashCode depends upon should be readonly (immutable). There's an explanation here: stackoverflow.com/a/4868940/469701
  • supercat
    supercat over 10 years
    @Acentric: An object's hash code should not change unless it's mutated in a fashion which affects equality. If a class can be mutated in such fashion as to affect equality, code should avoid storing in a dictionary any instance which might be exposed to code that would mutate it while it's in the dictionary. If the code which stores the object abides by that rule, having a hash code which reflects mutable state can be useful. It's too bad .NET doesn't better distinguish state equality and equivalence, since both are useful concepts.
  • supercat
    supercat over 10 years
    @Acentric: Even beyond using hash code for hash-table addressing, the fundamental idea behind a hash code is that knowledge that two objects have different hash codes implies that they're unequal and need not compare them. As a corollary, knowledge that many objects' hash codes don't match a given object's hash code implies that none of them are equal to the object. Using a hash code for addressing is basically a way of ignoring objects which have different hash codes.
  • sheikhjabootie
    sheikhjabootie over 10 years
    @supercat - I suppose the rationale is that by far the most common use cases of GetHashCode are within the context of a Dictionary, certainly in my experience. So the simple rule - that fields which effect the result of calls to GetHashCode should be immutable - would avoid vanishing objects, which is a very strange runtime behaviour and would trip up most developers. You're dead right about other uses of hashing where immutability is not a requirement - checksums for example. I guess you could always add a GetMutableHashCode method if you needed both scenarios in your projects.
  • supercat
    supercat over 10 years
    @Acentric: The normal (and by far most efficient) pattern for creating an variable-sized immutable container in .NET is to create one or more mutable-type objects which hold the data, but never expose those objects to anything which could mutate them after the immutable container is exposed to the outside world. If two such containers are fully constructed and encapsulate instances which have identical state, the encapsulated instances will always have identical state, and it would be useful for their Equals and GetHashCode members to be based upon that state.
  • supercat
    supercat over 10 years
    @Acentric: Generally, the only time mutable objects should be used as dictionary keys is when either the dictionary is used to associate supplemental information with particular object instances (so the dictionary doesn't care about the objects' state), or when the holder of the dictionary creates the dictionary and the objects therein, and doesn't expose any of them to any code that might mutate an object while it's stored in the dictionary. Each pattern can be useful at times, but the former is best implemented by using a reference-based comparator while the latter is best implemented...
  • supercat
    supercat over 10 years
    ...with the help of the encapsulated object, and there isn't any "standard" means via which an encapsulated object can supply such help other than via Equals/GetHashCode (one could define an interface IValueEquatable<T>, and define a GetEqualityComparer method which would look for that and otherwise fall back on EqualityComparer<T>.Default, but there's no standard convention for doing so).
  • Social Developer
    Social Developer over 4 years
    The correct and to the point answer to the question! GetHashCode() has to complement Equals() for the objects in question.
  • supercat
    supercat over 4 years
    @Sumith: Many discussions of hashing talk about buckets, but I think it's more useful to think of exclusion. If comparisons are expensive, hashing could offer benefits even when using collections that aren't organized into buckets.