Is a Python dictionary an example of a hash table?

225,510

Solution 1

Yes, it is a hash mapping or hash table. You can read a description of python's dict implementation, as written by Tim Peters, here.

That's why you can't use something 'not hashable' as a dict key, like a list:

>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

You can read more about hash tables or check how it has been implemented in python and why it is implemented that way.

Solution 2

There must be more to a Python dictionary than a table lookup on hash(). By brute experimentation I found this hash collision:

>>> hash(1.1)
2040142438
>>> hash(4504.1)
2040142438

Yet it doesn't break the dictionary:

>>> d = { 1.1: 'a', 4504.1: 'b' }
>>> d[1.1]
'a'
>>> d[4504.1]
'b'

Sanity check:

>>> for k,v in d.items(): print(hash(k))
2040142438
2040142438

Possibly there's another lookup level beyond hash() that avoids collisions between dictionary keys. Or maybe dict() uses a different hash.

(By the way, this in Python 2.7.10. Same story in Python 3.4.3 and 3.5.0 with a collision at hash(1.1) == hash(214748749.8).)

(I haven't found any collisions in Python 3.9.6. Since the hashes are bigger -- hash(1.1) == 230584300921369601 -- I estimate it would take my desktop a thousand years to find one. So I'll get back to you on this.)

Solution 3

Yes. Internally it is implemented as open hashing based on a primitive polynomial over Z/2 (source).

Solution 4

To expand upon nosklo's explanation:

a = {}
b = ['some', 'list']
a[b] = 'some' # this won't work
a[tuple(b)] = 'some' # this will, same as a['some', 'list']
Share:
225,510

Related videos on Youtube

Tommy Herbert
Author by

Tommy Herbert

Experience in Python, C++, Android and iOS; currently making CRM software using Java, JavaScript, XML and SQL. he/him

Updated on February 18, 2022

Comments

  • Tommy Herbert
    Tommy Herbert about 2 years

    One of the basic data structures in Python is the dictionary, which allows one to record "keys" for looking up "values" of any type. Is this implemented internally as a hash table? If not, what is it?

    • Torsten Marek
      Torsten Marek over 15 years
      If you're interested in the technical details, one article in Beautiful Code deals with the internals of Python's dict implementation.
    • Mooh
      Mooh over 15 years
      That was one of my favorite chapters in Beautiful Code.
    • chandola
      chandola almost 7 years
      Here is a talk by Brandon Craig Rhodes discussing how python dictionary works, youtube.com/watch?v=C4Kc8xzcA68.
    • Chen A.
      Chen A. about 6 years
      I looked for a diagram representing a dict for a while now, which decipt the implementation in memory and CPython. Thanks for referencing the book!
  • Matt Alcock
    Matt Alcock almost 12 years
    The Tim Peters link seams to be broken, is there a clean link out there?
  • Martijn Pieters
    Martijn Pieters over 11 years
    @MattAlcock: I've updated the link. Sometimes (usually due to someone wanting their email address removed somewhere) the python list archives are rebuilt and the ids of emails change, thus breaking these links. The pydotorg admins generally try to avoid that these days.
  • noɥʇʎԀʎzɐɹƆ
    noɥʇʎԀʎzɐɹƆ about 8 years
    But using .keys() can retrieve a list of keys. A real hash table wouldn't store keys, just hashes to save space.
  • TurnipEntropy
    TurnipEntropy almost 7 years
    So collisions are unavoidable. Set S may contain an infinitely large number of items, and you want it to hash to a number a computer can store. Every usable implementation of a hash table resolves collisions, with two of the most frequent methods being a) open addressing and b) chaining. Just because it doesn't utilize a perfect hash doesn't mean it's not a hash table.
  • Daniel Goldfarb
    Daniel Goldfarb almost 7 years
    More complete description of python dict implementation here: laurentluce.com/posts/python-dictionary-implementation
  • Yanfeng Liu
    Yanfeng Liu over 5 years
    Collisions will happen in general, because there are infinite possible hashable values and finite hash codes. Even a hash table would have to handle collision somehow.
  • Bob Stein
    Bob Stein over 5 years
    @YanfengLiu I believe those are exactly the same points TurnipEntropy made.
  • Will Croxford
    Will Croxford almost 5 years
    In Python 3.7, it looks like there are 2E20 minus 1 possible hash values, in fact. From -1E20 minus 1 to (+)1E20 minus 1. Try hash('I wandered lonely as a cloud, that drifts on high o\'er vales and hills, when all at once, I saw a crowd, a host of golden daffodils.') This gives a 19-digit decimal - -4037225020714749784 if you're geeky enough to care. Continue in your own words, kids, and the hash is still a 19-digit number. I assume there is a limit on length of string you can hash in Python, but safe to say many more possible strings than possible values. And hash(False) = 0 by the way.
  • nosklo
    nosklo almost 4 years
    @noɥʇʎԀʎzɐɹƆ - the key itself is not stored, only a reference to it and the hash.
  • TrancendentalObject
    TrancendentalObject over 3 years
    The reason why it doesn't break the dictionary is because under the hood the duplicate values are implemented using a linked list, and they are stored alongside a pointer back to the key they were generated from.
  • Nitish Agarwal
    Nitish Agarwal about 2 years
    There are many. hash(0) = 0; hash(2 ** 61 - 1) = 0; hash(-2 ** 61 + 1) = 0