Python equivalent of std::set and std::multimap
Solution 1
For the sorted dictionary, you can (ab)use the stable nature of python's timsort: basically, keep the items partially sorted, append items at the end when needed, switching a "dirty" flag, and sort the remaining before iterating. See this entry for details and implementation (A Martelli's answer): Key-ordered dict in Python
Solution 2
You should use sort(key=...)
.
The key function you use will be related to the cmp you are already using. The advantage is that the key function is called n times whereas the cmp is called nlog n times, and typically key does half the work that cmp does
If you can include your __cmp__()
we can probably show you how to convert it to a key function
If you are doing lots of iterations between modifications, you should cache the value of the sorted items.
Solution 3
Python does not have built-in data-structures for this, though the bisect
module provides functionality for keeping a sorted list with appropriately efficient algorithms.
If you have a list of sorted keys, you can couple it with a collections.defaultdict(list)
to provide multimap-like functionality.
EMP
Updated on June 14, 2022Comments
-
EMP almost 2 years
I'm porting a C++ program to Python. There are some places where it uses
std::set
to store objects that define their own comparison operators. Since the Python standard library has no equivalent ofstd::set
(a sorted key-value mapping data structure) I tried using a normal dictionary and then sorting it when iterating, like this:def __iter__(self): items = self._data.items() items.sort() return iter(items)
However, profiling has shown that all the calls from
.sort()
to__cmp__
are a serious bottleneck. I need a better data structure - essentially a sorted dictionary. Does anyone know of an existing implementation? Failing that, any recommendations on how I should implement this? Read performance is more important than write performance and time is more important than memory.Bonus points if it supports multiple values per key, like the C++
std::multimap
.Note that the
OrderedDict
class doesn't fit my needs, because it returns items in the order of insertion, whereas I need them sorted using their__cmp__
methods. -
EMP over 14 yearsWhile not directly answering the question about data structures this has certainly helped to improve performance. +1