List unhashable, but tuple hashable?

26,440

Solution 1

Mainly, because tuples are immutable. Assume the following works:

>>> l = [1, 2, 3]
>>> t = (1, 2, 3)
>>> x = {l: 'a list', t: 'a tuple'}

Now, what happens when you do l.append(4)? You've modified the key in your dictionary! From afar! If you're familiar with how hashing algorithms work, this should frighten you. Tuples, on the other hand, are absolutely immutable. t += (1,) might look like it's modifying the tuple, but really it's not: it simply creating a new tuple, leaving your dictionary key unchanged.

Solution 2

You could totally make that work, but I bet you wouldn't like the effects.

from functools import reduce
from operator import xor

class List(list):
    def __hash__(self):
        return reduce(xor, self)

Now let's see what happens:

>>> l = List([23,42,99])
>>> hash(l)
94
>>> d = {l: "Hello"}
>>> d[l]
'Hello'
>>> l.append(7)
>>> d
{[23, 42, 99, 7]: 'Hello'}
>>> l
[23, 42, 99, 7]
>>> d[l]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: [23, 42, 99, 7]

edit: So I thought about this some more. You could make the above example work, if you return the list's id as its hash value:

class List(list):
    def __hash__(self):
        return id(self)

In that case, d[l] will give you 'Hello', but neither d[[23,42,99,7]] nor d[List([23,42,99,7])] will (because you're creating a new [Ll]ist.

Solution 3

Since a list is mutable, if you modify it you would modify its hash too, which ruins the point of having a hash (like in a set or a dict key).

Edit: I'm surprised this answer regularly get new upvotes, it was really quickly written. I feel I need to make it better now.

So the set and the dict native data structures are implemented with a hashmap. Data types in Python may have a magic method __hash__() that will be used in hashmap construction and lookups.

Only immutable data types (int, string, tuple, ...) have this method, and the hash value is based on the data and not the identity of the object. You can check this by

>>> a = (0,1)
>>> b = (0,1)
>>> a is b
False # Different objects
>>> hash(a) == hash(b)
True # Same hash

If we follow this logic, mutating the data would mutate the hash, but then what's the point of a changing hash ? It defeats the whole purpose of sets and dicts or other hashes usages.

Fun fact : if you try the example with strings or ints -5 <= i <= 256, a is b returns True because of micro-optimizations (in CPython at least).

Solution 4

Because lists are mutable and tuples aren't.

Solution 5

The answers are good. The reason is the mutability. If we could use list in dicts as keys; (or any mutable object) then we would be able to change the key by mutating that key (either accidentally or intentionally). This would cause change in the hash value of the key in dictionary due to which we would not be able to retrace the value from that data structure by that key. Hash values and Hash tables are used to map the large data with ease by mapping them to indices which stores the real value entries.

Read more about them here:-

Hash Tables & Hash Functions & Assosiative Arrays

Share:
26,440

Related videos on Youtube

gsamaras
Author by

gsamaras

Yahoo! Machine Learning and Computer Vision team, San Francisco, California. Masters in Data Science. Received Stackoverflow Swag, Good Samaritan SO swag and "10 years Stackoverflow" Swag x2! In Top 10 users of my country.

Updated on January 21, 2020

Comments

  • gsamaras
    gsamaras over 4 years

    In How to hash lists? I was told that I should convert to a tuple first, e.g. [1,2,3,4,5] to (1,2,3,4,5).

    So the first cannot be hashed, but the second can. Why*?


    *I am not really looking for a detailed technical explanation, but rather for an intuition

  • gsamaras
    gsamaras almost 8 years
    val, great explanation! What are you trying to say here: From afar ! ? Do you think the question was so bad to get a downvote?
  • val
    val almost 8 years
    I mean that you've modified the key of your dictionary from outside the dictionary: since hashtables rely on the 1:1(ish) correspondance of keys and hashes, modifying the key behind the hash's back is a very bad idea indeed.
  • Dunes
    Dunes almost 8 years
    You've not really said why modifying a key is bad -- because it changes the hash value of the key, meaning the place where the key/value pair is stored becomes invalid, meaning you can't retrieve the key/value pair any more. Also, hashtables will work with a ∞:1 key to hash correspondence (all keys having the same hash value). All that is effected is their performance.
  • gsamaras
    gsamaras almost 8 years
    @Dunes can you expand on that?
  • Keith
    Keith almost 6 years
    True, so you need to make those embedded lists into tuples.
  • Vicrobot
    Vicrobot over 5 years
    can you provide more legal citation and content on your sayings?
  • polku
    polku over 4 years
    better late than never I guess : "high performance python" o'reilly book, there is a description of builtin data structures implementation.