When to use a SortedList<TKey, TValue> over a SortedDictionary<TKey, TValue>?

46,597

Solution 1

I'm not sure how accurate the MSDN documentation is on SortedList and SortedDictionary. It seems to be saying both are implemented using a binary search tree. But if the SortedList uses a binary search tree, why would it be much slower on additions than SortedDictionary?

Anyway, here are some performance test results.

Each test operates on a SortedList / SortedDictionary containing 10,000 int32 keys. Each test is repeated 1,000 times (Release build, Start without Debugging).

The first group of tests add keys in sequence from 0 to 9,999. The second group of tests add random shuffled keys between 0 to 9,999 (every number is added exactly once).

***** Tests.PerformanceTests.SortedTest

SortedDictionary Add sorted: 4411 ms
SortedDictionary Get sorted: 2374 ms


SortedList Add sorted: 1422 ms
SortedList Get sorted: 1843 ms

***** Tests.PerformanceTests.UnsortedTest

SortedDictionary Add unsorted: 4640 ms
SortedDictionary Get unsorted: 2903 ms


SortedList Add unsorted: 36559 ms
SortedList Get unsorted: 2243 ms

As with any profiling, the important thing is the relative performance, not the actual numbers.

As you can see, on sorted data the sorted list is faster than the SortedDictionary. On unsorted data the SortedList is slightly quicker on retrieval, but about 9 times slower on adding.

If both are using binary trees internally, it is quite surprising that the Add operation on unsorted data is so much slower for SortedList. It is possible that sorted list may also be adding items to a sorted linear data structure at the same time, which would slow it down.

However, you would expect the memory usage of a SortedList to be equal or greater than or at least equal to a SortedDictionary. But this contradicts what the MSDN documentation says.

Solution 2

I don't know why MSDN says that SortedList<TKey, TValue> use a binary tree for its implementation because if you look at code with a decompiler like Reflector you realize its not true.

SortedList<TKey, TValue> is simply an array that grows over the time.

Every time you insert an element, it first check if the array has enough capacity, if not, a bigger array is recreated and old elements are copied into it (like List<T>)

After that, it searches where to insert the element, using a binary search (this is possible since the array is indexable and already sorted).

To keep the array sorted, it moves (or pushes) all the elements situated after position of element to be inserted by one position (using Array.Copy()).

Eg :

// we want to insert "3" 

2  
4  <= 3
5
8
9
.      
.      
.  

// we have to move some elements first

2
.  <= 3
4 
5  |
8  v
9
.
.

That explains why performance of SortedList is so bad when you insert unsorted elements. It has to re-copy some elements almost every insertion. The only case it has not to be done is when the element has to be inserted at the end of the array.

SortedDictionary<TKey, TValue> is different and use a binary tree to insert and retrieve elements. It also has some cost at insert because sometimes the tree need to be re-balanced (but not every insertion).

Performance is quite similar while searching an element with SortedList or SortedDictionary because they both use a binary search.


In my opinion, you should never use SortedList to just sort an array. Unless you have very few elements, it will always be faster to insert values into a list (or array) and then call Sort() method.

SortedList is mostly useful when you have a list of values already sorted (eg: from database), you want to keep it sorted and perform some operations that would take advantage it is sorted (eg: Contains() method of SortedList performs a binary search instead of linear search)

SortedDictionary offers same advantages than SortedList but performs better if values to insert are not already sorted.


EDIT : If you are using .NET Framework 4.5, an alternative to SortedDictionary<TKey, TValue> is SortedSet<T>. It works the same way as SortedDictionary, using a binary tree, but keys and values are the same here.

Solution 3

Are they meant for two different purposes?

There is not much semantic difference these two collection types in .NET make. They both offer keyed lookup as well as keep the entries in sort order of keys. In most cases you will be ok with either of them. Perhaps the only differentiator would be the indexed retrieval SortedList permits.

But performance?

However there is a performance difference which might be a stronger factor to choose between them. Here is a tabular view of their asymptotic complexity.

+------------------+---------+----------+--------+----------+----------+---------+
| Collection       | Indexed | Keyed    | Value  | Addition |  Removal | Memory  |
|                  | lookup  | lookup   | lookup |          |          |         |
+------------------+---------+----------+--------+----------+----------+---------+
| SortedList       | O(1)    | O(log n) | O(n)   | O(n)*    | O(n)     | Lesser  |
| SortedDictionary | n/a     | O(log n) | O(n)   | O(log n) | O(log n) | Greater |
+------------------+---------+----------+--------+----------+----------+---------+

* Insertion is O(1) for data that are already in sort order, so that each 
  element is added to the end of the list (assuming no resize is required).

Summary

To roughly summarize, you want a SortedList<K, V> when:

  1. you require indexed look-up.
  2. it's desirable to have lesser memory overhead.
  3. your input data is already sorted (say you get it already ordered from db).

You would instead want to prefer a SortedDictionary<K, V> when:

  1. relative overall performance matters (with respect to scaling).
  2. your input data is unordered.

Writing code

Both SortedList<K, V> and SortedDictionary<K, V> implement IDictionary<K, V>, so in your code you can return IDictionary<K, V> from the method or declare variable as IDictionary<K, V>. Basically hide the implementation detail, and code against interface.

IDictionary<K, V> x = new SortedDictionary<K, V>(); //for eg. 

In future, its easier to switch from either in case you're not happy with performance characteristic of one collection.


For more info on the two collection types see the original question linked.

Solution 4

Visual representation of performance differences.

enter image description here

Solution 5

That's all there is to it. Retrieval of keys is comparable, but addition is much faster with Dictionaries.

I try to use SortedList as much as possible because it allows me to iterate over the keys and value collections. This is not possible with SortedDictionary as far as I know.

I'm not sure about this, but as far as I know Dictionaries store data in Tree structures, whereas List store data in linear arrays. That explains why insertion and removal is much faster with dictionaries, since less memory has to be shifted around. It also explains why you can iterate over SortedLists but not SortedDictionary.

Share:
46,597
Scott Dorman
Author by

Scott Dorman

Scott is a C# MVP, author and INETA North America Community Speaker who has been involved with computers in one way or another for as long as he can remember, but started professionally in 1993. He has worked at Fortune 500 companies and privately held start-ups focused on IT consulting where he gained experience in embedded systems design and software development to systems administration and database programming, and everything in between. After spending 6 years as a systems administrator, Scott started developing eCommerce store fronts. Since 2001, he has worked on many different projects using .NET and C#. Although his primary focus right now is commercial software applications, he prefers building infrastructure components, reusable shared libraries and helping companies define, develop and automate process standards and guidelines. Scott runs a software architecture-focused user group, speaks extensively, blogs, and contributes regularly to online communities such as The Code Project and StackOverflow, and is the Community Manager and Senior Editor for DotNetKicks. He is also the creator of Windows Phone Marketplace Requests.

Updated on June 27, 2020

Comments

  • Scott Dorman
    Scott Dorman about 4 years

    This may appear to be a duplicate of this question, which asks "What’s the difference between SortedList and SortedDictionary?" Unfortunately, the answers do nothing more than quote the MSDN documentation (which clearly states that there are performance and memory use differences between the two) but don't actually answer the question.

    In fact (and so this question doesn't get the same answers), according to MSDN:

    The SortedList<TKey, TValue> generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this, it is similar to the SortedDictionary<TKey, TValue> generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal:

    • SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.

    • SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList<TKey, TValue>.

    • If the list is populated all at once from sorted data, SortedList<TKey, TValue> is faster than SortedDictionary<TKey, TValue>.

    So, clearly this would indicated that SortedList<TKey, TValue> is the better choice unless you need faster insert and remove operations for unsorted data.

    The question still remains, given the information above what are the practical (real-world, business case, etc.) reasons for using a SortedDictionary<TKey, TValue>? Based on the performance information, it would imply that there really is no need to have SortedDictionary<TKey, TValue> at all.

  • jerryjvl
    jerryjvl almost 15 years
    SortedDictionary has Keys and Values collections to iterate over. The only thing it lacks is indexed access to the elements of these two collections, which the SortedList does allow.
  • David Rutten
    David Rutten almost 15 years
    Sorry, yes. You can foreach them, but I almost never use foreach loops, which is why I mistakingly thought it wasn't possible at all.
  • Jørgen Fogh
    Jørgen Fogh almost 14 years
    Their complexity bounds would be consistent with an implementation of SortedList using an array. Then lookups would be performed using a binary search in O(log n). Insertions would be in O(n).
  • gatopeich
    gatopeich over 12 years
    I would add that SortedList is actually faster with smaller lists, even in "unsorted" scenario, the threshold appearing around ~700 elements in my own tests. Thus, a rule of thumb would be "use SortedList unless you need to store more than 1000 elements".
  • Jeppe Stig Nielsen
    Jeppe Stig Nielsen about 11 years
    The newest version of the SortedList<,> doc says: The SortedList<TKey, TValue> generic class is an array of key/value pairs – It also emphasizes that with SortedList<,> you can do things like string v = mySortedList.Values[3];, i.e. index by integer like an array.
  • Aidin
    Aidin over 10 years
    Well if you read any basic algorithms book you would realize that one of the ways of implementing a binary tree is using an array webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html
  • IDK
    IDK over 10 years
    I would guess What tigrou means is SortedList is an array implementation whereas SortedDictionary is a Linked implementation, which would explain what he sees in the reverse engineered code and what Ash sees in his test
  • AaronHS
    AaronHS over 10 years
    "I'm not sure about this, but as far as I know Dictionaries store data in Tree structures" this is incorrect. The standard dictionary class in .net uses an array.
  • Qwertie
    Qwertie over 8 years
    @gatopeich: are you talking about the speed of retrieval or of insertion? I'd expect the threshold to be more like 10 to 30 elements rather rather than 700 in the insertion scenario. In any case, adding (or removing) random items to SortedList gets extremely slow for large lists, so even if there's only a 1% chance of encountering a list of 10,000 elements, you should use SortedDictionary instead.
  • JSF
    JSF about 7 years
    How is this visual?
  • Markus
    Markus about 3 years
    I had to use my eyes to see it :)