How does hashing work for python sets

18,534

Solution 1

An (immutable) object will always have the same hash for its entire lifetime. The hash will be the same between set1 and set2 -- This allows for operations like union and intersection to take place between sets, as well as determining if a member is unique within the set, among other things.

>>> first_dict = {"s":"s".__hash__(), "t":"t".__hash__()}
>>> second_dict = {"s":"s".__hash__()}
>>> first_dict
{'s': -638743784, 't': 711199131}
>>> second_dict
{'s': -638743784}

You can have identical strings that are different objects in memory. But since they're identical strings, they will hash the same.

>>> id(a)
7858120
>>> id(b)
7858624
>>> a.__hash__()
-1117164856
>>> b.__hash__()
-1117164856

An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

Solution 2

I suggest you read this previous question, It will explain in details the hashing in Python.

Solution 3

When Python creates a string, it is based on the string value. Therefore, both 's' and 's' assigned to different variables have equal hashes. Hence:

>>> a = 's'
>>> b = 's'
>>> hash(a) == hash(b)
True

However, note that they may not always point to the same string in memory:

>>> a = 'this is some really long string'
>>> b = 'this is some really long string'
>>> id(a)
47002352L
>>> id(b)
47002608L
Share:
18,534
memelord23
Author by

memelord23

Updated on June 09, 2022

Comments

  • memelord23
    memelord23 almost 2 years

    I am completely familiar on how hashtables and hashes work but I am trying to fully understand how the O(1) completely comes from.

    set1 = {'s','t'}
    print('x' in set1)
    print('s' in set1)
    set2 = {'s'}
    print('s' in set2)
    

    I am told that to check if 's' is in set1, if will check the memory allocation of the hash of 's', and check if it is in set1 in O(1) and return the boolean. Therefore two O(1) operations, but my question is: How does the hashes actually work indepth. What I mean by this is, when you hash 's', does that hash have something like set1 and set2 and you're checking if set1 is either set1 or set2, or does each set have a different hash of 's' and you're checking the hash of 's' for each different set.