Python - class __hash__ method and set

14,889

Solution 1

Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.

This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.

To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:

class MyClass(object):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        # another object is equal to self, iff 
        # it is an instance of MyClass
        return isinstance(other, MyClass)

Now:

>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1

In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:

class Person
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn

    def __eq__(self, other):
        return isinstance(other, Person) and self.ssn == other.ssn

    def __hash__(self):
        # use the hashcode of self.ssn since that is used
        # for equality checks as well
        return hash(self.ssn)

p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q})  # 1

Solution 2

Sets need two methods to make an object hashable: __hash__ and __eq__. Two instances must return the same hash value when they are considered equal. An instance is considered already present in a set if both the hash is present in the set and the instance is considered equal to one of the instances with that same hash in the set.

Your class doesn't implement __eq__, so the default object.__eq__ is used instead, which only returns true if obj1 is obj2 is also true. In other words, two instances are only considered equal if they are the exact same instance.

Just because their hashes match, doesn't make them unique as far as a set is concerned; even objects with different hashes can end up in the same hash table slot, as the modulus of the hash against the table size is used.

Add your a custom __eq__ method that returns True when two instances are supposed to be equal:

def __eq__(self, other):
    if not isinstance(other, type(self)):
        return False
    # all instances of this class are considered equal to one another
    return True
Share:
14,889
Aida.Mirabadi
Author by

Aida.Mirabadi

Python Developer at Farakav co.

Updated on June 06, 2022

Comments

  • Aida.Mirabadi
    Aida.Mirabadi about 2 years

    I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.

    But it behaves different as below:

    class MyClass(object):
    
        def __hash__(self):
            return 0
    
    result = set()
    result.add(MyClass())
    result.add(MyClass())
    
    print(len(result)) # len = 2
    

    While in case of string value, it works correctly.

    result.add('aida')
    result.add('aida')
    
    print(len(result)) # len = 1
    

    My question is: why the same hash objects are not same in set?