How does hashing work for python sets
Solution 1
An (immutable) object will always have the same hash for its entire lifetime. The hash will be the same between set1
and set2
-- This allows for operations like union
and intersection
to take place between sets, as well as determining if a member is unique within the set, among other things.
>>> first_dict = {"s":"s".__hash__(), "t":"t".__hash__()}
>>> second_dict = {"s":"s".__hash__()}
>>> first_dict
{'s': -638743784, 't': 711199131}
>>> second_dict
{'s': -638743784}
You can have identical strings that are different objects in memory. But since they're identical strings, they will hash the same.
>>> id(a)
7858120
>>> id(b)
7858624
>>> a.__hash__()
-1117164856
>>> b.__hash__()
-1117164856
An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().
Solution 2
I suggest you read this previous question, It will explain in details the hashing in Python.
Solution 3
When Python creates a string, it is based on the string value. Therefore, both 's'
and 's'
assigned to different variables have equal hashes. Hence:
>>> a = 's'
>>> b = 's'
>>> hash(a) == hash(b)
True
However, note that they may not always point to the same string in memory:
>>> a = 'this is some really long string'
>>> b = 'this is some really long string'
>>> id(a)
47002352L
>>> id(b)
47002608L
memelord23
Updated on June 09, 2022Comments
-
memelord23 almost 2 years
I am completely familiar on how hashtables and hashes work but I am trying to fully understand how the O(1) completely comes from.
set1 = {'s','t'} print('x' in set1) print('s' in set1) set2 = {'s'} print('s' in set2)
I am told that to check if
's'
is in set1, if will check the memory allocation of the hash of's'
, and check if it is in set1 in O(1) and return the boolean. Therefore two O(1) operations, but my question is: How does the hashes actually work indepth. What I mean by this is, when you hash's'
, does that hash have something likeset1
andset2
and you're checking ifset1
is eitherset1
orset2
, or does each set have a different hash of's'
and you're checking the hash of's'
for each different set.