Python equivalent of std::set and std::multimap

11,801

Solution 1

For the sorted dictionary, you can (ab)use the stable nature of python's timsort: basically, keep the items partially sorted, append items at the end when needed, switching a "dirty" flag, and sort the remaining before iterating. See this entry for details and implementation (A Martelli's answer): Key-ordered dict in Python

Solution 2

You should use sort(key=...).
The key function you use will be related to the cmp you are already using. The advantage is that the key function is called n times whereas the cmp is called nlog n times, and typically key does half the work that cmp does

If you can include your __cmp__() we can probably show you how to convert it to a key function

If you are doing lots of iterations between modifications, you should cache the value of the sorted items.

Solution 3

Python does not have built-in data-structures for this, though the bisect module provides functionality for keeping a sorted list with appropriately efficient algorithms.

If you have a list of sorted keys, you can couple it with a collections.defaultdict(list) to provide multimap-like functionality.

Share:
11,801
EMP
Author by

EMP

Updated on June 14, 2022

Comments

  • EMP
    EMP almost 2 years

    I'm porting a C++ program to Python. There are some places where it uses std::set to store objects that define their own comparison operators. Since the Python standard library has no equivalent of std::set (a sorted key-value mapping data structure) I tried using a normal dictionary and then sorting it when iterating, like this:

    def __iter__(self):
        items = self._data.items()
        items.sort()
        return iter(items)
    

    However, profiling has shown that all the calls from .sort() to __cmp__ are a serious bottleneck. I need a better data structure - essentially a sorted dictionary. Does anyone know of an existing implementation? Failing that, any recommendations on how I should implement this? Read performance is more important than write performance and time is more important than memory.

    Bonus points if it supports multiple values per key, like the C++ std::multimap.

    Note that the OrderedDict class doesn't fit my needs, because it returns items in the order of insertion, whereas I need them sorted using their __cmp__ methods.

  • EMP
    EMP over 14 years
    While not directly answering the question about data structures this has certainly helped to improve performance. +1