How to implement a good __hash__ function in python
Solution 1
__hash__
should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.
A trivial implementation would be to just return 0
. This is always correct, but performs badly.
Your solution, returning the hash of a tuple of properties, is good. But note that you don't need to list all properties that you compare in __eq__
in the tuple. If some property usually has the same value for inequal objects, just leave it out. Don't make the hash computation any more expensive than it needs to be.
Edit: I would recommend against using xor to mix hashes in general. When two different properties have the same value, they will have the same hash, and with xor these will cancel eachother out. Tuples use a more complex calculation to mix hashes, see tuplehash
in tupleobject.c
.
Solution 2
Documentation for object.__hash__(self)
The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example
def __hash__(self):
return hash((self.name, self.nick, self.color))
Solution 3
It's dangerous to write
def __eq__(self, other):
return other and self.a == other.a and self.b == other.b
because if your rhs (i.e., other
) object evaluates to boolean False, it will never compare as equal to anything!
In addition, you might want to double check if other
belongs to the class or subclass of AClass
. If it doesn't, you'll either get exception AttributeError
or a false positive (if the other class happens to have the same-named attributes with matching values). So I would recommend to rewrite __eq__
as:
def __eq__(self, other):
return isinstance(other, self.__class__) and self.a == other.a and self.b == other.b
If by any chance you want an unusually flexible comparison, which compares across unrelated classes as long as attributes match by name, you'd still want to at least avoid AttributeError
and check that other
doesn't have any additional attributes. How you do it depends on the situation (since there's no standard way to find all attributes of an object).
Related videos on Youtube
user2756501
Master geek, developer, avid reader and one of the minds behind novlet.com and bitlet.org :P@abahgat
Updated on July 08, 2022Comments
-
user2756501 almost 2 years
When implementing a class with multiple properties (like in the toy example below), what is the best way to handle hashing?
I guess that the
__eq__
and__hash__
should be consistent, but how to implement a proper hash function that is capable of handling all the properties?class AClass: def __init__(self): self.a = None self.b = None def __eq__(self, other): return other and self.a == other.a and self.b == other.b def __ne__(self, other): return not self.__eq__(other) def __hash__(self): return hash((self.a, self.b))
I read on this question that tuples are hashable, so I was wondering if something like the example above was sensible. Is it?
-
Feuermurmel about 10 yearsJust make sure to use
hash()
on a tuple with exactly the elements that are compared in__eq__()
and friends (exactly as you did) and you're good to go.
-
-
Björn Pollex over 13 yearsAs you said hash functions usually only make sense for immutable objects. Hence it is possible to calculate the hash-value once in
__init__
. -
Scott Griffiths over 13 years+1 for the
return 0
hash function - I've always thought that anything else is premature optimisation :-). (I'm only half kidding). -
eigenein about 13 yearsIt will work, but it's bad that if you exchange
self.a
andself.b
then you'll get the same hash while it will be the other "object". -
user1066101 about 13 years"somehow mix together (e.g. using exclusive or" is a pretty flexible set of requirements. If it actually matters, then
(hash(self.a)<<1) ^ hash(self.b)
might be better. There's no general answer, just a general guideline that has to be modified based on the specific application. -
max about 12 years@BjörnPollex Rather than doing it in
__init__
, you can just cache the value in__hash__
. That way if__hash__
is never called, you didn't waste either time or memory. I assume checking whether the value has been already cached isn't expensive is it? (Not sure if it's best through exception or explicitif
). -
Fred Foo almost 12 yearsIt's unfortunate that Python does not make a
combine_hashes
function available. -
Aaron Barclay almost 11 yearsYour first statement seems incorrect. "It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects." It is implemented in mutable objects as well no?
-
javawizard almost 11 yearsIt's not implemented in things like dict or list, the justification being that changing the hash of an object that already belongs to, e.g., a set wreaks havoc on the set's internal data structures.
-
max about 9 years@eigenein if many cases, it's an advantage that the hash is unchanged when the order is changed. if you try to hash a
dict
orset
, the hash that depends on the order of iteration is invalid. OTOH, the hash that simply causes extra collisions once in a while is, at worst, inefficient. -
nightpool over 8 yearswhy not just hash a tuple value? hash((self.a, self.b))
-
Carl Meyer almost 8 years@AaronBarclay in Python 2, all user-defined classes inherit the default
object.__hash__
by default, so they are always hashable, but frequently break the hash-equality invariant if they define a custom__eq__
. So in Py2 if you define__eq__
, you should really always either set__hash__ = None
(if instances are mutable) or define__hash__
(if immutable). In Py3 defining__eq__
automatically sets__hash__
toNone
, so just avoid defining__hash__
unless instances are immutable. See asmeurer.github.io/blog/posts/… -
PM 2Ring over 7 years
-
AXO almost 7 yearsFor those whom are interested, here is the bug that led to removal of XOR recommendation: bugs.python.org/issue28383
-
fersarr over 6 years@axo we should update this answer with the hash of tuples
-
Mad Physicist over 6 yearsUseful info, but unrelated to the primary question about hashing.
-
Leo Ufimtsev about 5 yearsNot related, but thank you for posting this never the less. +1.
-
Zaya almost 5 years@AXO, thanks for providing the link! Very interesting stuff.
-
ShadowRanger over 4 years@FredFoo: But they do. That's what
hash
called on atuple
does, combine the hashes of the values in thetuple
. The cost of making a smalltuple
is trivial (Python optimizestuple
s specially since they're created implicitly in many different ways to support non-tuple
-specific functionality), and callinghash
on it combines the hashes of its component parts. Any reasonable implementation of acombine_hashes
function would do the same thing, constructing atuple
of the arguments (built-ins receive their arguments as atuple
implicitly), then callinghash
on it. -
ShadowRanger over 4 years@FredFoo:
combine_hashes
would actually be more complex than that, because if you're going to expose such a function, you'd need to make an order-insensitive version as well (or allow passing an argument to make it an order-insensitive hash). Python's solution for that case is to just dohash(frozenset({attributes, to, hash}))
, which gets the canonical order-insensitive hash. -
ShadowRanger over 4 yearsThis is a bad
__eq__
implementation, as it does not delegate to the right hand side's__eq__
if the left hand does not know how to do the comparison. Ifint
had a__eq__
like this,1 == MyNumericType(1)
would always beFalse
, even ifMyNumericType(1) == 1
would returnTrue
. If you don't recognize the type ofother
, alwaysreturn NotImplemented
, don't justreturn False
. -
max over 4 years@ShadowRanger Agreed.
-
Chris Graf over 3 yearsCould someone elaborate on
return 0
being always correct, but performing badly? How can this always be correct? Shouldn'thash
values be different for different objects? What about the performance part? Why does it perform badly?