Why is dictionary ordering non-deterministic?

11,781

Solution 1


Update: In Python 3.6, dict has a new implementation which preserves insertion order. From Python 3.7, this order-preserving behaviour is guaranteed:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to "Security improvements").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be unpredictable and differ across Python runs. Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python. Thus, some existing applications may be relying on dict or set ordering. Because of this and the fact that many Python applications which don't accept untrusted input are not vulnerable to this attack, in all stable Python releases mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation ("Note" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys())

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999}
> do
>   python3.3 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
     61 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
     73 ['1', '0', '3', '2', '5', '4', '7', '6', '9', '8']
     62 ['2', '3', '0', '1', '6', '7', '4', '5', '8', '9']
     59 ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']
     58 ['4', '5', '6', '7', '0', '1', '2', '3', '8', '9']
     55 ['5', '4', '7', '6', '1', '0', '3', '2', '9', '8']
     62 ['6', '7', '4', '5', '2', '3', '0', '1', '8', '9']
     63 ['7', '6', '5', '4', '3', '2', '1', '0', '9', '8']
     60 ['8', '9', '0', '1', '2', '3', '4', '5', '6', '7']
     66 ['8', '9', '2', '3', '0', '1', '6', '7', '4', '5']
     65 ['8', '9', '4', '5', '6', '7', '0', '1', '2', '3']
     53 ['8', '9', '6', '7', '4', '5', '2', '3', '0', '1']
     62 ['9', '8', '1', '0', '3', '2', '5', '4', '7', '6']
     52 ['9', '8', '3', '2', '1', '0', '7', '6', '5', '4']
     73 ['9', '8', '5', '4', '7', '6', '1', '0', '3', '2']
     76 ['9', '8', '7', '6', '5', '4', '3', '2', '1', '0']

As noted at the beginning of this answer, that's no longer the case in Python 3.6:

$ for x in {0..999}
> do
>   python3.6 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
   1000 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Solution 2

Note that Python 3.7 still has non-deterministic sets however. dicts preserve insertion order but sets do not. Sets can exhibit the same random behavior.

python3 -c "print({str(i) for i in range(9)})"

still gives different results from one run to the next.

Share:
11,781

Related videos on Youtube

Anaphory
Author by

Anaphory

Computational researcher (geography and diversity linguistics) in my day life, and some of my spare time.

Updated on June 06, 2022

Comments

  • Anaphory
    Anaphory about 2 years

    I recently switched from Python 2.7 to Python 3.3, and it seems that while in Python 2 the ordering of dictionary keys was arbitrary but consistent, in Python 3 the ordering of the keys of a dictionary obtained with e.g. vars() appears non-deterministic.

    If I run:

    class Test(object): pass
    parameters = vars(Test)
    print(list(parameters.keys()))
    

    in both Python 2.7 and Python 3.3, then:

    • Python 2.7 consistently gives me

      ['__dict__', '__module__', '__weakref__', '__doc__']
      
    • With Python 3.3, I can get any random order – for example:

      ['__weakref__', '__module__', '__qualname__', '__doc__', '__dict__']
      ['__doc__', '__dict__', '__qualname__', '__module__', '__weakref__']
      ['__dict__', '__module__', '__qualname__', '__weakref__', '__doc__']
      ['__weakref__', '__doc__', '__qualname__', '__dict__', '__module__']
      

    Where does this non-determinism come from? And why is something like

    list({str(i): i for i in range(10)}.keys())
    

    … consistent between runs, always giving

    ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']
    

    … ?

  • Anaphory
    Anaphory over 11 years
    So why does this not apply to something like {str(i): i for i in range(10)}?
  • nmz787
    nmz787 almost 8 years
    So how do we disable this randomization?
  • Zero Piraeus
    Zero Piraeus almost 8 years
  • user541686
    user541686 about 5 years
    Came here to post an answer because I found this out the hard way... then I saw this one... +1