GetHashCode Guidelines in C#

58,342

Solution 1

The answer is mostly, it is a valid guideline, but perhaps not a valid rule. It also doesn't tell the whole story.

The point being made is that for mutable types, you cannot base the hash code on the mutable data because two equal objects must return the same hash code and the hash code has to be valid for the lifetime of the object. If the hash code changes, you end up with an object that gets lost in a hashed collection because it no longer lives in the correct hash bin.

For example, object A returns hash of 1. So, it goes in bin 1 of the hash table. Then you change object A such that it returns a hash of 2. When a hash table goes looking for it, it looks in bin 2 and can't find it - the object is orphaned in bin 1. This is why the hash code must not change for the lifetime of the object, and just one reason why writing GetHashCode implementations is a pain in the butt.

Update
Eric Lippert has posted a blog that gives excellent information on GetHashCode.

Additional Update
I've made a couple of changes above:

  1. I made a distinction between guideline and rule.
  2. I struck through "for the lifetime of the object".

A guideline is just a guide, not a rule. In reality, GetHashCode only has to follow these guidelines when things expect the object to follow the guidelines, such as when it is being stored in a hash table. If you never intend to use your objects in hash tables (or anything else that relies on the rules of GetHashCode), your implementation doesn't need to follow the guidelines.

When you see "for the lifetime of the object", you should read "for the time the object needs to co-operate with hash tables" or similar. Like most things, GetHashCode is about knowing when to break the rules.

Solution 2

It's been a long time, but nevertheless I think it is still necessary to give a correct answer to this question, including explanations about the whys and hows. The best answer so far is the one citing the MSDN exhaustivly - don't try to make your own rules, the MS guys knew what they were doing.

But first things first: The Guideline as cited in the question is wrong.

Now the whys - there are two of them

First why: If the hashcode is computed in a way, that it does not change during the lifetime of an object, even if the object itself changes, than it would break the equals-contract.

Remember: "If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values."

The second sentence often is misinterpreted as "The only rule is, that at object creation time, the hashcode of equal objects must be equal". Don't really know why, but that's about the essence of most answers here as well.

Think of two objects containing a name, where the name is used in the equals method: Same name -> same thing. Create Instance A: Name = Joe Create Instance B: Name = Peter

Hashcode A and Hashcode B will most likely not be the same. What would now happen, when the Name of instance B is changed to Joe?

According to the guideline from the question, the hashcode of B would not change. The result of this would be: A.Equals(B) ==> true But at the same time: A.GetHashCode() == B.GetHashCode() ==> false.

But exactly this behaviour is forbidden explicitly by the equals&hashcode-contract.

Second why: While it is - of course - true, that changes in the hashcode could break hashed lists and other objects using the hashcode, the reverse is true as well. Not changing the hashcode will in the worst case get hashed lists, where all of a lot of different objects will have the same hashcode and therefor be in the same hash bin - happens when objects are initialized with a standard value, for example.


Now coming to the hows Well, on first glance, there seems to be a contradiction - either way, code will break. But neither problem does come from changed or unchanged hashcode.

The source of the problems is well described in the MSDN:

From MSDN's hashtable entry:

Key objects must be immutable as long as they are used as keys in the Hashtable.

This does mean:

Any object that creates a hashvalue should change the hashvalue, when the object changes, but it must not - absolutely must not - allow any changes to itself, when it is used inside a Hashtable (or any other Hash-using object, of course).

First how Easiest way would of course be to design immutable objects only for the use in hashtables, that will be created as copys of the normal, the mutable objects when needed. Inside the immutable objects, it's obviusly ok to cache the hashcode, since it's immutable.

Second how Or give the object a "you are hashed now"-flag, make sure all object data is private, check the flag in all functions that can change objects data and throw an exception data if change is not allowed (i.e. flag is set). Now, when you put the object in any hashed area, make sure to set the flag, and - as well - unset the flag, when it is no longer needed. For ease of use, I'd advise to set the flag automatically inside the "GetHashCode" method - this way it can't be forgotten. And the explicit call of a "ResetHashFlag" method will make sure, that the programmer will have to think, wether it is or is not allowed to change the objects data by now.

Ok, what should be said as well: There are cases, where it is possible to have objects with mutable data, where the hashcode is nevertheless unchanged, when the objects data is changed, without violating the equals&hashcode-contract.

This does however require, that the equals-method is not based on the mutable data as well. So, if I write an object, and create a GetHashCode method that does calculate a value only once and stores it inside the object to return it on later calls, then I must, again: absolutely must, create a Equals method, that will use stored values for the comparison, so that A.Equals(B) will never change from false to true as well. Otherwise, the contract would be broken. The result of this will usually be that the Equals method doesn't make any sense - it's not the original reference equals, but it is neither a value equals as well. Sometimes, this may be intended behaviour (i.e. customer records), but usually it is not.

So, just make GetHashCode result change, when the object data changes, and if the use of the object inside of hash using lists or objects is intended (or just possible) then make the object either immutable or create a readonly flag to use for the lifetime of a hashed list containing the object.

(By the way: All of this is not C# oder .NET specific - it is in the nature of all hashtable implementations, or more generally of any indexed list, that identifying data of objects should never change, while the object is in the list. Unexpected and unpredictable behaviour will occur, if this rule is broken. Somewhere, there may be list implementations, that do monitor all elements inside the list and do automatic reindexing the list - but the performance of those will surely be gruesome at best.)

Solution 3

From MSDN

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

This means that if the value(s) of the object change, the hash code should change. For example, a "Person" class with the "Name" property set to "Tom" should have one hash code, and a different code if you change the name to "Jerry". Otherwise, Tom == Jerry, which is probably not what you would have intended.


Edit:

Also from MSDN:

Derived classes that override GetHashCode must also override Equals to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.

From MSDN's hashtable entry:

Key objects must be immutable as long as they are used as keys in the Hashtable.

The way I read this is that mutable objects should return different hashcodes as their values change, unless they are designed for use in a hashtable.

In the example of System.Drawing.Point, the object is mutable, and does return a different hashcode when the X or Y value changes. This would make it a poor candidate to be used as-is in a hashtable.

Solution 4

I think that the documentation regarding GetHashcode is a bit confusing.

On one hand, MSDN states that the hashcode of an object should never change , and be constant On the other hand, MSDN also states that the return value of GetHashcode should be equal for 2 objects, if those 2 objects are considered to be equal.

MSDN:

A hash function must have the following properties:

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
  • For the best performance, a hash function must generate a random distribution for all input.

Then, this means that all your objects should be immutable, or the GetHashcode method should be based on properties of your object that are immutable. Suppose for instance that you have this class (naive implementation):

public class SomeThing
{
      public string Name {get; set;}

      public override GetHashCode()
      {
          return Name.GetHashcode();
      }

      public override Equals(object other)
      {
           SomeThing = other as Something;
           if( other == null ) return false;
           return this.Name == other.Name;
      }
}

This implementation already violates the rules that can be found in MSDN. Suppose you have 2 instances of this class; the Name property of instance1 is set to 'Pol', and the Name property of instance2 is set to 'Piet'. Both instances return a different hashcode, and they're also not equal. Now, suppose that I change the Name of instance2 to 'Pol', then, according to my Equals method, both instances should be equal, and according to one of the rules of MSDN, they should return the same hashcode.
However, this cannot be done, since the hashcode of instance2 will change, and MSDN states that this is not allowed.

Then, if you have an entity, you could maybe implement the hashcode so that it uses the 'primary identifier' of that entity, which is maybe ideally a surrogate key, or an immutable property. If you have a value object, you can implement the Hashcode so that it uses the 'properties' of that value object. Those properties make up the 'definition' of the value object. This is of course the nature of a value object; you're not interested in it's identity, but rather in it's value.
And, therefore, value objects should be immutable. (Just like they are in the .NET framework, string, Date, etc... are all immutable objects).

Another thing that comes in mind:
During which 'session' (I don't know really how I should call this) should 'GetHashCode' return a constant value. Suppose you open up your application, load an instance of an object out of the DB (an entity), and get its hashcode. It will return a certain number. Close the application, and load the same entity. Is it required that the hashcode this time has the same value as when you loaded the entity the first time ? IMHO, not.

Solution 5

This is good advice. Here's what Brian Pepin has to say on the matter:

This has tripped me up more than once: Make sure GetHashCode always returns the same value across the lifetime of an instance. Remember that hash codes are used to identify "buckets" in most hashtable implementations. If an object's "bucket" changes, a hashtable may not be able to find your object. These can be very hard bugs to find, so get it right the first time.

Share:
58,342
Joan Venge
Author by

Joan Venge

Professional hitman.

Updated on May 07, 2020

Comments

  • Joan Venge
    Joan Venge about 4 years

    I read in the Essential C# 3.0 and .NET 3.5 book that:

    GetHashCode()’s returns over the life of a particular object should be constant (the same value), even if the object’s data changes. In many cases, you should cache the method return to enforce this.

    Is this a valid guideline?

    I have tried a couple built-in types in .NET and they didn't behave like this.

  • Joan Venge
    Joan Venge over 15 years
    Thanks, actually I never used Resharper but I keep seeing it mentioned quite often, so I should give it a try.
  • Jon B
    Jon B over 15 years
    How do you determine equality between mutable types?
  • JSBձոգչ
    JSBձոգչ over 15 years
    You shouldn't be using GetHashCode to determine equality.
  • Jon B
    Jon B over 15 years
    @JS Bangs - From MSDN: Derived classes that override GetHashCode must also override Equals to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.
  • Joan Venge
    Joan Venge over 15 years
    Thanks. Interesting point, but when I try it on say the Point type, changing X, or Y returns a different hashcode than before it's changed. Why did the .NET team ignored this, because Point has mutable data.
  • Jeff Yates
    Jeff Yates over 15 years
    @Joan Venge: Two things. First, not even Microsoft has got GetHashCode right at every implementation. Second, value types are generally immutable with every value being a new instance rather than a modification of an existing instance.
  • Jeff Yates
    Jeff Yates over 15 years
    This is not really correct. The hash code must remain constant for a particular instance. In the case of value types, it is often the case that each value is a unique instance and therefore, the hash appears to change, but actually its a new instance.
  • Jon B
    Jon B over 15 years
    System.Drawing.Point is not immutable. You can change the X and Y properties, and doing so results in a different hash code.
  • Joan Venge
    Joan Venge over 15 years
    Thanks Jeff it makes sense. So say for Point type, when I change X, it creates a new Point value, and assign to itself?
  • Jeff Yates
    Jeff Yates over 15 years
    No, Point is unfortunately, mutable and breaks the rules of GetHashCode, as Jon B just pointed out. Like I said, not even Microsoft got this one quite right.
  • Joan Venge
    Joan Venge over 15 years
    Thanks guys. Then it's safe to assume there are probably more types that have bad GetHashCode implementations like this one, right? Do you know any type like Point type that's mutable but has the correct GetHashCode implementation?
  • Jeff Yates
    Jeff Yates over 15 years
    It's only a major issue if you intend to use the type in situations that require a valid hash (such as in a HashTable). if that's the case, the easiest way might be to wrap the Point type in an immutable variant of your own that somehow achieves the three goals of GetHashCode.
  • Joan Venge
    Joan Venge over 15 years
    Thanks. What about the types like Point type that's mutable but has the correct GetHashCode implementation? I just wanna inspect those?
  • Jeff Yates
    Jeff Yates over 15 years
    Point has an incorrect GetHashCode if it is mutable and GetHashCode returns a different value when an instance is mutated.
  • Joan Venge
    Joan Venge over 15 years
    Sorry I meant to say, types unlike Point where they have correct GetHashCode implementation. Do you know any of those? But it should still have mutable members so I can see how they dealt with it.
  • Joan Venge
    Joan Venge over 15 years
    How about generating a random number but for Equals comparison use the actual values, like X and Y? Would it work?
  • Jeff Yates
    Jeff Yates over 15 years
    Ah, well, that's where the difficult bit comes in.
  • Joan Venge
    Joan Venge over 15 years
    Good one. Is there a good sample at least I can kinda use as a base? Maybe there is a type that's implemented like this so you can at least imitate the same, which would be very good. Either way, the idea I suggested is doable?
  • Jeff Yates
    Jeff Yates over 15 years
    If you want a mutable type that can be used in a HashTable, you need to freeze the mutation while the object is in the hashtable unless the hash is not tied to the mutable properties. However, if the mutable properties are involved in equality, then they must affect the hash too. However...
  • Jeff Yates
    Jeff Yates over 15 years
    ...remember that while two instances that equal must have the same hash, not all instances with the same hash are equal.
  • DavidN
    DavidN over 15 years
    You're right, value types are immutable so they preclude changing. Good catch.
  • Ogre Psalm33
    Ogre Psalm33 about 15 years
    Your example is why Jeff Yates says you cannot base the hash code on the mutable data. You can't stick a mutable object in a Dictionary and expect it to work well if the hash code is based on the mutable values of that object.
  • chikak
    chikak over 14 years
    I am not able to see where is the MSDN rule violated? The rule clearly says: The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. This means that hashcode of instance2 is allowed to be changed when you change the Name of instance2 to Pol
  • Shimmy Weitzhandler
    Shimmy Weitzhandler about 14 years
    How do I implement equality on a mutable reference-type object? please view stackoverflow.com/questions/2626547/…
  • Jeff Yates
    Jeff Yates about 14 years
    @Shimmy: You should undelete your question and answer; they could be useful to another programmer in future who faces the same problem and wants a solution like the one you found.
  • Shimmy Weitzhandler
    Shimmy Weitzhandler about 14 years
    I am sorry, it was a mistake, I moved the question: stackoverflow.com/questions/2626839/…
  • skolima
    skolima almost 14 years
    GetHashCode() is designed for use in a hashtable, that's the sole point of this function.
  • Jon B
    Jon B almost 14 years
    @skolima - the MSDN documentation is inconsistent with that. Mutable objects may implement GetHashCode(), and should return different values as the object's value changes. Hashtables must use immutable keys. Hence, you can use GetHashCode() for something other than a hashtable.
  • Oliver
    Oliver almost 14 years
    +1 for this detailed explanation (would give more if i could)
  • Eric Tuttleman
    Eric Tuttleman over 13 years
    I didn't down vote it, but I'd guess that others did because it's a quote that doesn't cover the whole problem. Pretend strings were mutable, but didn't change hash codes. You create "bob", use it as a key in a hashtable, and then change it's value to "phil". Next create a new string "phil". if you then look for a hash table entry with the key "phil" the item you originally put in won't be found. If someone searched on "bob" it would be found, but you'd get a value out that may no longer be correct. Either be diligent to not use keys that are mutable, or be aware of dangers.
  • Niklas
    Niklas over 13 years
    Since a.Equals(b) must mean that a.GetHashCode() == b.GetHashCode(), the hash code most often has to change if data used for equality comparison is changed. I'd say that the problem is not GetHashCode being based on mutable data. The problem is using mutable objects as hash table keys (and actually mutating them). Am I wrong?
  • Jeff Yates
    Jeff Yates over 13 years
    @Niklas: Yes, that is a problem. I'm not sure I'd go so far as to say there was only one problem though. In the end, it all depends what you're trying to achieve, but based on the rules, the hash code shouldn't change for the lifetime of the object - but as you and others point out, that really is only a problem when the hash code is actually being used such as in a hash table.
  • Joe
    Joe over 13 years
    +1 this is definitely the better answer because of the verbose explaination! :)
  • Orhan Cinar
    Orhan Cinar almost 13 years
    @Jeff Yates Eric Lipperts post is great.Thanks!
  • Jeff Yates
    Jeff Yates almost 13 years
    @Orhan: No problem. It's my go-to-guide for GetHashCode.
  • Neil
    Neil almost 13 years
    Eric Lippert's guideline is significantly different from the guideline in the question. The guideline in the question is not correct. @Jeff your follow-up comments show an understanding of this. Would you consider updating your answer?
  • Jeff Yates
    Jeff Yates almost 13 years
    @Neil: I've made a couple of edits to clarify things, hopefully.
  • ΩmegaMan
    ΩmegaMan over 11 years
    +1 Resharper if one has it generates a nice GetHashCode implementation.
  • supercat
    supercat over 11 years
    @EricTuttleman: Were I writing the rules for a framework, I would have specified that for any pair of objects X and Y, once X.Equals(Y) or Y.Equals(X) has been called, all future calls should yield the same result. If one wants to use some other definition of equality, use an EqualityComparer<T>.
  • nawfal
    nawfal over 10 years
    You should highlight the last paragraph in bold! That's the essence of it. Not many realise it.