Hashable, immutable

31,513

Solution 1

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change. If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

To really grasp the idea you should try to implement your own hashtable in a language like C/C++, or read the Java implementation of the HashMap class.

Solution 2

  • Are there mutable objects that are hashable or immutable objects that are not hashable?

In Python, tuple is immutable, but it is hashable only if all its elements are hashable.

>>> tt = (1, 2, (30, 40))
>>> hash(tt)
8027212646858338501
>>> tl = (1, 2, [30, 40])
>>> hash(tl)
TypeError: unhashable type: 'list'

Hashable Types

  • The atomic immutable types are all hashable, such as str, bytes, numeric types
  • A frozen set is always hashable(its elements must be hashable by definition)
  • A tuple is hashable only if all its elements are hashable
  • User-defined types are hashable by default because their hash value is their id()

Solution 3

From the Python Glossary:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

Dicts and sets must use a hash for efficient lookup in a hash table; the hash values must be immutable, because changing the hash will mess up the data structures and cause the dict or set to fail. The easiest way to make the hash value immutable is to make the whole object immutable, which is why the two are often mentioned together.

While none of the built-in mutable objects are hashable, it is possible to make a mutable object with a hash value that's not mutable. It's common for only a portion of the object to represent its identity, while the rest of the object contains properties that are free to change. As long as the hash value and comparison functions are based on the identity but not the mutable properties, and the identity never changes, you've met the requirements.

Solution 4

Technically, hashable means that the class defines __hash__(). According to the docs:

__hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

I think that for the Python builtin types, all hashable types are also immutable.

It would be difficult but perhaps not impossible to have a mutable object that nonetheless defined __hash__().

Solution 5

There is an implicit even if there is no explicit relationship forced between immutable and hashable due the interplay between

  1. Hashable objects which compare equal must have the same hash value
  2. An object is hashable if it has a hash value which never changes during its lifetime.

There is no problem here unless you redefine __eq__ so the the objects class defines equivalence on value.

Once you've done that you need to find a stable hash function which always returns the same value for objects which represent the same value (eg, where __eq__) returns True, and never changes during the lifetime of an object.

It hard to see an application where this is possible, consider a possible class A which meets these requirements. Although there is the obvious degenerate case where __hash__ returns a constant.

Now:-

>>> a = A(1)
>>> b = A(1)
>>> c = A(2)
>>> a == b
True
>>> a == c
False
>>> hash(a) == hash(b)
True
>>> a.set_value(c)
>>> a == c
True
>>> assert(hash(a) == hash(c)) # Because a == c => hash(a) == hash(c)
>>> assert(hash(a) == hash(b)) # Because hash(a) and hash(b) have compared equal 
                                 before and the result must stay static over the objects lifetime.

In fact this means at creation hash(b) == hash(c), despite the fact there are never compared equal. I struggle to see anyway to usefully define __hash__() for a mutable object which defines compare by value.

Note: __lt__, __le__ , __gt__ and __ge__ comparsions aren't affected so you can still define an ordering of hashable objects, mutable or otherwise based on their value.

Share:
31,513
joaquin
Author by

joaquin

a research guy

Updated on July 08, 2022

Comments

  • joaquin
    joaquin almost 2 years

    From a recent SO question (see Create a dictionary in python which is indexed by lists) I realized I probably had a wrong conception of the meaning of hashable and immutable objects in python.

    • What does hashable mean in practice?
    • What is the relation between hashable and immmutable?
    • Are there mutable objects that are hashable or immutable objects that are not hashable?
  • joaquin
    joaquin about 14 years
    @javier: 'In Python they're mostly interchangeable' My doubts refer to the small part not included in 'mostly'
  • Ishbir
    Ishbir about 14 years
    @Andrey: frozensets are hashable, sets aren't; both can only contain hashable items. In the places that Mark mentioned sets, he was correct, so I don't think he meant frozensets.
  • Scott Griffiths
    Scott Griffiths about 14 years
    User defined classes by default define hashable types (the hash is just the object's id). This can't change during the object's lifetime, so it is hashable, but that doesn't mean you can't define mutable types! Sorry, but hashability does not imply immutability.
  • Scott Griffiths
    Scott Griffiths about 14 years
    It's worth noting that __hash__ is defined by default to return the object's id; you have to go out of your way to set __hash__ = None to make it unhashable. Also as Mark Ransom mentions there's an extra condition that it's only hashable if the hash value can never change!
  • endolith
    endolith almost 10 years
    "Hashable objects which compare equal must have the same hash value." Why? I can create objects that compare equal but do not have the same hash value.
  • user2622016
    user2622016 almost 10 years
    It is possible to create such objects, but it would be a violation of a concept defined in Python documentation. The idea is that, in fact, we can use this requirement to derive such (logically equivalent) implication: If hashes are not equal, then the objects are not equal. Very useful. Many implementations, containers and algorithms rely on this implication to speed up things.
  • sleblanc
    sleblanc over 9 years
    Common cases where comparison != identity is when comparing "invalid" values together like float("nan") == float("nan"), or interned strings from slices: "apple" is "apple" vs. "apple" is "crabapple"[4:]
  • Admin
    Admin over 8 years
    Moreover, it is not possible for hash tables to detect when hash of their keys change (in any efficient way at least). It is a common pitfall e.g. in Java where a HashMap becomes broken if you modify an object used as a key in it: neither old nor new key can be found, even though if you print the map, it can be seen there.
  • Mark Ransom
    Mark Ransom about 8 years
    @ScottGriffiths I don't know why it took me 6 years to see your comment, but better late than never. I don't know how I could have been so far off, given that I've lamented the inability to put mutable objects into a C++ set. I hope my edit fixes things.
  • Matti Lyra
    Matti Lyra about 7 years
    I think the answer is a little misleading, list defines __hash__ in the sense that hasattr([1,2,3], "__hash__") returns True, however calling hash([1,2,3]) raises a TypeError (Python 3), so it's not exactly hashable. Relying on the existence of __hash__ is not sufficient to determine if something is a) hashable b) immutable
  • Pranjal Mittal
    Pranjal Mittal about 7 years
    Hashable and Immutable are somewhat related but not the same. E.g. Instances created from custom classes inheriting object are hashable but not immutable. These instances can be used has keys of a dict, but they can still be modified if passed around.
  • ThatNewGuy
    ThatNewGuy about 3 years
    Note: the Python Glossary has been updated. Not all built-in objects are hashable. Immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable.