Are there any reasons not to use an OrderedDict?
Solution 1
OrderedDict
is a subclass of dict
, and needs more memory to keep track of the order in which keys are added. This isn't trivial. The implementation adds a second dict
under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies. It's not a lot slower, but at least doubles the memory over using a plain dict
.
But if it's appropriate, use it! That's why it's there :-)
How it works
The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all. When a <key, value>
pair is added, the key
is appended to a list. The list is the part that remembers the order.
But if this were a Python list, deleting a key would take O(n)
time twice over: O(n)
time to find the key in the list, and O(n)
time to remove the key from the list.
So it's a doubly-linked list instead. That makes deleting a key constant (O(1)
) time. But we still need to find the doubly-linked list node belonging to the key. To make that operation O(1)
time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.
So adding a new <key, value>
pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict. A bit over twice as much work, but still O(1)
(expected case) time overall.
Similarly, deleting a key that's present is also a bit over twice as much work but O(1)
expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.
Etc. It's quite efficient.
Solution 2
Since Python 3.7, all dictionaries are guaranteed to be ordered. The Python contributors determined that switching to making dict
ordered would not have a negative performance impact. I don't know how the performance of OrderedDict
compares to dict
in Python >= 3.7, but I imagine they would be comparable since they are both ordered.
Note that there are still differences between the behaviour of OrderedDict
and dict
. See also: Will OrderedDict become redundant in Python 3.7?
Solution 3
multithreading
if your dictionary is accessed from multiple threads without a lock, especially as a synchronisation point.
vanilla dict operations are atomic, and any type extended in Python is not.
In fact, I'm not even certain OrderedDict is thread-safe (without a lock), although I cannot discount the possibility that it was very carefully coded and satisfies definition of reentrancy.
lesser devils
memory usage if you create tons of these dictionaries
cpu usage if all your code does is munge these dictionaries
Solution 4
why shouldn't I always use this instead of a normal dictionary
In Python 2.7, normal OrderedDict
usage will create reference cycles. So any use of OrderedDict
requires the garbage collector to be enabled in order to free the memory. Yes, the garbage collector is on by default in cPython, but disabling it has its uses.
e.g. With cPython 2.7.14
from __future__ import print_function
import collections
import gc
if __name__ == '__main__':
d = collections.OrderedDict([('key', 'val')])
gc.collect()
del d
gc.set_debug(gc.DEBUG_LEAK)
gc.collect()
for i, obj in enumerate(gc.garbage):
print(i, obj)
outputs
gc: collectable <list 00000000033E7908>
gc: collectable <list 000000000331EC88>
0 [[[...], [...], 'key'], [[...], [...], 'key'], None]
1 [[[...], [...], None], [[...], [...], None], 'key']
Even if you just create an empty OrderedDict
(d = collections.OrderedDict()
) and don't add anything to it, or you explicitly try to clean it up by calling the clear
method (d.clear()
before del d
), you will still get one self-referencing list:
gc: collectable <list 0000000003ABBA08>
0 [[...], [...], None]
This seems to have been the case since this commit removed the __del__
method in order to prevent the potential for OrderedDict
to cause uncollectable cycles, which are arguably worse. As noted in the changelog for that commit:
Issue #9825: removed __del__ from the definition of collections.OrderedDict. This prevents user-created self-referencing ordered dictionaries from becoming permanently uncollectable GC garbage. The downside is that removing __del__ means that the internal doubly-linked list has to wait for GC collection rather than freeing memory immediately when the refcnt drops to zero.
Note that in Python 3, the fix for the same issue was made differently and uses weakref proxies to avoid cycles:
Issue #9825: Using __del__ in the definition of collections.OrderedDict made it possible for the user to create self-referencing ordered dictionaries which become permanently uncollectable GC garbage. Reinstated the Py3.1 approach of using weakref proxies so that reference cycles never get created in the first place.
Comments
-
temporary_user_name almost 2 years
I'm referring to the OrderedDict from the
collections
module, which is an ordered dictionary.If it has the added functionality of being orderable, which I realize may often not be necessary but even so, are there any downsides? Is it slower? Is it missing any functionality? I didn't see any missing methods.
In short, why shouldn't I always use this instead of a normal dictionary?
-
Grijesh Chauhan over 10 yearsNice answer, but we need a link to read further, where from you retried this information.
-
Tim Peters over 10 years@GrijeshChauhan, I read the source code - I'm a core Python developer, so that's how I answer most questions I have - LOL ;-) You can find the code in
Lib/collections/__init__.py
in your Python source tree. -
temporary_user_name over 10 yearsWait...YOU'RE THE GUY WHO WROTE TIMSORT!!! Unexpected descent from python heaven to answer my lowly question. THANKS!
-
Tim Peters over 10 yearsLOL! You're very welcome, @Aerovistae - it was a worthy question ;-)
-
abarnert over 10 yearsI find when I tell people "you can find the code in your Python source tree" they never look, but when I link to the hg repo they sometimes do. (Usually only when reading the source leads them to a question that's over my head.)
-
temporary_user_name over 10 years@TimPeters you should take a look here stackoverflow.com/questions/19351065/… :)
-
Games Brainiac about 10 years@GrijeshChauhan Go to your python interpreter, type
import this
then press enter, the guy who wrote it, is the guy who answered this question. -
Anshul Goyal almost 10 yearsHi Tim, can you explain further upon
It's not a lot slower, but at least doubles the memory over using a plain dict
? I have asked a question on the same here -
onepiece almost 9 years@TimPeters What if I give the OrderedDict an ordering scheme other than the default (time of of insertion)? For example, sorted alphanumerically by key. Then I believe adding will take
O(n)
at least, because the key must now be added to the sorted double-linked list, which I think is done by traversing and comparing to each node's key. -
martineau about 8 years@onepiece: OrderedDicts are ordered by insertion order so there's no way to directly insert something in the middle. To achieve this, you'll need to rebuild the entire thing from scratch in the desired order to insert one (or more) items into it. I suppose you could derive your own dict subclass that allowed insertions, but that would require updating the internals maintained by base class (like the hidden dict and doubly-linked list).
-
std''OrgnlDave almost 7 years@onepiece there are many algorithms devoted specifically to keeping sorted lists (for searching), such as red-black trees, which are used to create dicts. They 'sort on insertion,' and their running time is usually around O(log n). There's also my favorite, the fiendishly clever pypi.python.org/pypi/sortedcontainers which takes advantage of TimPeters sorting algorithm over lists to accomplish this.
-
Boris Verkhovskiy over 4 yearsDicts are ordered on CPython 3.6 and all other Python implementations starting with Python 3.7.
-
johan over 2 yearsWhy not storing <key, value> in the doubly linked list though? It seems the base dictionary is not necessary.
-
Tim Peters over 2 years@johan, the base dictionary is needed to support
O(1)
lookup time. Looking up a key in a bare doubly linked list would take time linear in the length of the list. -
TheLogicGuy over 2 years@timpeters so asking for value at index x would be O(n) because it will traverse the double linked list until it finds the key?
-
piepi over 2 years@TheLogicGuy No,
get
operation will happen on the baseunordered
dict taking O(1) time. The potentialO(n)
time is taken if you have todelete
a node. That is where the second hidden dict comes in. You simply delete from the base dict. You find the node in the LL via the hidden dict. Delete that node from the LL. Delete that entry from the hidden dict. Done! -
piepi over 2 years@TimPeters Why not have the node in the base dict itself thus eliminating the hidden dict? If the delete operation is slow, what is the alternative? Most LRUs seem to use a hashmap and a doubly linked list.
-
Tim Peters over 2 years@piepi, don't mean to ignore you, but I have no more interest in this question, which is about
collections.OrderedDict
. Starting several releases ago, the built-in dict type is ordered now, It uses an entirely different implementation, with no "hidden dict".