Are there any reasons not to use an OrderedDict?

12,421

Solution 1

OrderedDict is a subclass of dict, and needs more memory to keep track of the order in which keys are added. This isn't trivial. The implementation adds a second dict under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies. It's not a lot slower, but at least doubles the memory over using a plain dict.

But if it's appropriate, use it! That's why it's there :-)

How it works

The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all. When a <key, value> pair is added, the key is appended to a list. The list is the part that remembers the order.

But if this were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.

So it's a doubly-linked list instead. That makes deleting a key constant (O(1)) time. But we still need to find the doubly-linked list node belonging to the key. To make that operation O(1) time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.

So adding a new <key, value> pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict. A bit over twice as much work, but still O(1) (expected case) time overall.

Similarly, deleting a key that's present is also a bit over twice as much work but O(1) expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.

Etc. It's quite efficient.

Solution 2

Since Python 3.7, all dictionaries are guaranteed to be ordered. The Python contributors determined that switching to making dict ordered would not have a negative performance impact. I don't know how the performance of OrderedDict compares to dict in Python >= 3.7, but I imagine they would be comparable since they are both ordered.

Note that there are still differences between the behaviour of OrderedDict and dict. See also: Will OrderedDict become redundant in Python 3.7?

Solution 3

multithreading

if your dictionary is accessed from multiple threads without a lock, especially as a synchronisation point.

vanilla dict operations are atomic, and any type extended in Python is not.

In fact, I'm not even certain OrderedDict is thread-safe (without a lock), although I cannot discount the possibility that it was very carefully coded and satisfies definition of reentrancy.

lesser devils

memory usage if you create tons of these dictionaries

cpu usage if all your code does is munge these dictionaries

Solution 4

why shouldn't I always use this instead of a normal dictionary

In Python 2.7, normal OrderedDict usage will create reference cycles. So any use of OrderedDict requires the garbage collector to be enabled in order to free the memory. Yes, the garbage collector is on by default in cPython, but disabling it has its uses.

e.g. With cPython 2.7.14

from __future__ import print_function

import collections
import gc

if __name__ == '__main__':
    d = collections.OrderedDict([('key', 'val')])
    gc.collect()
    del d
    gc.set_debug(gc.DEBUG_LEAK)
    gc.collect()
    for i, obj in enumerate(gc.garbage):
        print(i, obj)

outputs

gc: collectable <list 00000000033E7908>
gc: collectable <list 000000000331EC88>
0 [[[...], [...], 'key'], [[...], [...], 'key'], None]
1 [[[...], [...], None], [[...], [...], None], 'key']

Even if you just create an empty OrderedDict (d = collections.OrderedDict()) and don't add anything to it, or you explicitly try to clean it up by calling the clear method (d.clear() before del d), you will still get one self-referencing list:

gc: collectable <list 0000000003ABBA08>
0 [[...], [...], None]

This seems to have been the case since this commit removed the __del__ method in order to prevent the potential for OrderedDict to cause uncollectable cycles, which are arguably worse. As noted in the changelog for that commit:

Issue #9825: removed __del__ from the definition of collections.OrderedDict. This prevents user-created self-referencing ordered dictionaries from becoming permanently uncollectable GC garbage. The downside is that removing __del__ means that the internal doubly-linked list has to wait for GC collection rather than freeing memory immediately when the refcnt drops to zero.


Note that in Python 3, the fix for the same issue was made differently and uses weakref proxies to avoid cycles:

Issue #9825: Using __del__ in the definition of collections.OrderedDict made it possible for the user to create self-referencing ordered dictionaries which become permanently uncollectable GC garbage. Reinstated the Py3.1 approach of using weakref proxies so that reference cycles never get created in the first place.

Share:
12,421
temporary_user_name
Author by

temporary_user_name

Student of many things.

Updated on June 17, 2022

Comments

  • temporary_user_name
    temporary_user_name almost 2 years

    I'm referring to the OrderedDict from the collections module, which is an ordered dictionary.

    If it has the added functionality of being orderable, which I realize may often not be necessary but even so, are there any downsides? Is it slower? Is it missing any functionality? I didn't see any missing methods.

    In short, why shouldn't I always use this instead of a normal dictionary?

  • Grijesh Chauhan
    Grijesh Chauhan over 10 years
    Nice answer, but we need a link to read further, where from you retried this information.
  • Tim Peters
    Tim Peters over 10 years
    @GrijeshChauhan, I read the source code - I'm a core Python developer, so that's how I answer most questions I have - LOL ;-) You can find the code in Lib/collections/__init__.py in your Python source tree.
  • temporary_user_name
    temporary_user_name over 10 years
    Wait...YOU'RE THE GUY WHO WROTE TIMSORT!!! Unexpected descent from python heaven to answer my lowly question. THANKS!
  • Tim Peters
    Tim Peters over 10 years
    LOL! You're very welcome, @Aerovistae - it was a worthy question ;-)
  • abarnert
    abarnert over 10 years
    I find when I tell people "you can find the code in your Python source tree" they never look, but when I link to the hg repo they sometimes do. (Usually only when reading the source leads them to a question that's over my head.)
  • temporary_user_name
    temporary_user_name over 10 years
    @TimPeters you should take a look here stackoverflow.com/questions/19351065/… :)
  • Games Brainiac
    Games Brainiac about 10 years
    @GrijeshChauhan Go to your python interpreter, type import this then press enter, the guy who wrote it, is the guy who answered this question.
  • Anshul Goyal
    Anshul Goyal almost 10 years
    Hi Tim, can you explain further upon It's not a lot slower, but at least doubles the memory over using a plain dict? I have asked a question on the same here
  • onepiece
    onepiece almost 9 years
    @TimPeters What if I give the OrderedDict an ordering scheme other than the default (time of of insertion)? For example, sorted alphanumerically by key. Then I believe adding will take O(n) at least, because the key must now be added to the sorted double-linked list, which I think is done by traversing and comparing to each node's key.
  • martineau
    martineau about 8 years
    @onepiece: OrderedDicts are ordered by insertion order so there's no way to directly insert something in the middle. To achieve this, you'll need to rebuild the entire thing from scratch in the desired order to insert one (or more) items into it. I suppose you could derive your own dict subclass that allowed insertions, but that would require updating the internals maintained by base class (like the hidden dict and doubly-linked list).
  • std''OrgnlDave
    std''OrgnlDave almost 7 years
    @onepiece there are many algorithms devoted specifically to keeping sorted lists (for searching), such as red-black trees, which are used to create dicts. They 'sort on insertion,' and their running time is usually around O(log n). There's also my favorite, the fiendishly clever pypi.python.org/pypi/sortedcontainers which takes advantage of TimPeters sorting algorithm over lists to accomplish this.
  • Boris Verkhovskiy
    Boris Verkhovskiy over 4 years
    Dicts are ordered on CPython 3.6 and all other Python implementations starting with Python 3.7.
  • johan
    johan over 2 years
    Why not storing <key, value> in the doubly linked list though? It seems the base dictionary is not necessary.
  • Tim Peters
    Tim Peters over 2 years
    @johan, the base dictionary is needed to support O(1) lookup time. Looking up a key in a bare doubly linked list would take time linear in the length of the list.
  • TheLogicGuy
    TheLogicGuy over 2 years
    @timpeters so asking for value at index x would be O(n) because it will traverse the double linked list until it finds the key?
  • piepi
    piepi over 2 years
    @TheLogicGuy No, get operation will happen on the base unordered dict taking O(1) time. The potential O(n) time is taken if you have to delete a node. That is where the second hidden dict comes in. You simply delete from the base dict. You find the node in the LL via the hidden dict. Delete that node from the LL. Delete that entry from the hidden dict. Done!
  • piepi
    piepi over 2 years
    @TimPeters Why not have the node in the base dict itself thus eliminating the hidden dict? If the delete operation is slow, what is the alternative? Most LRUs seem to use a hashmap and a doubly linked list.
  • Tim Peters
    Tim Peters over 2 years
    @piepi, don't mean to ignore you, but I have no more interest in this question, which is about collections.OrderedDict. Starting several releases ago, the built-in dict type is ordered now, It uses an entirely different implementation, with no "hidden dict".