.Net Data structures: ArrayList, List, HashTable, Dictionary, SortedList, SortedDictionary -- Speed, memory, and when to use each?

167,967

Solution 1

Off the top of my head:

  • Array* - represents an old-school memory array - kind of like a alias for a normal type[] array. Can enumerate. Can't grow automatically. I would assume very fast insert and retrival speed.

  • ArrayList - automatically growing array. Adds more overhead. Can enum., probably slower than a normal array but still pretty fast. These are used a lot in .NET

  • List - one of my favs - can be used with generics, so you can have a strongly typed array, e.g. List<string>. Other than that, acts very much like ArrayList

  • Hashtable - plain old hashtable. O(1) to O(n) worst case. Can enumerate the value and keys properties, and do key/val pairs

  • Dictionary - same as above only strongly typed via generics, such as Dictionary<string, string>

  • SortedList - a sorted generic list. Slowed on insertion since it has to figure out where to put things. Can enum., probably the same on retrieval since it doesn't have to resort, but deletion will be slower than a plain old list.

I tend to use List and Dictionary all the time - once you start using them strongly typed with generics, its really hard to go back to the standard non-generic ones.

There are lots of other data structures too - there's KeyValuePair which you can use to do some interesting things, there's a SortedDictionary which can be useful as well.

Solution 2

If at all possible, use generics. This includes:

  • List instead of ArrayList
  • Dictionary instead of HashTable

Solution 3

First, all collections in .NET implement IEnumerable.

Second, a lot of the collections are duplicates because generics were added in version 2.0 of the framework.

So, although the generic collections likely add features, for the most part:

  • List is a generic implementation of ArrayList.
  • Dictionary<T,K> is a generic implementation of Hashtable

Arrays are a fixed size collection that you can change the value stored at a given index.

SortedDictionary is an IDictionary<T,K> that is sorted based on the keys. SortedList is an IDictionary<T,K> that is sorted based on a required IComparer.

So, the IDictionary implementations (those supporting KeyValuePairs) are:

  • Hashtable
  • Dictionary<T,K>
  • SortedList<T,K>
  • SortedDictionary<T,K>

Another collection that was added in .NET 3.5 is the Hashset. It is a collection that supports set operations.

Also, the LinkedList is a standard linked-list implementation (the List is an array-list for faster retrieval).

Solution 4

Here are a few general tips for you:

  • You can use foreach on types that implement IEnumerable. IList is essentially an IEnumberable with Count and Item (accessing items using a zero-based index) properties. IDictionary on the other hand means you can access items by any-hashable index.

  • Array, ArrayList and List all implement IList. Dictionary, SortedDictionary, and Hashtable implement IDictionary.

  • If you are using .NET 2.0 or higher, it is recommended that you use generic counterparts of mentioned types.

  • For time and space complexity of various operations on these types, you should consult their documentation.

  • .NET data structures are in System.Collections namespace. There are type libraries such as PowerCollections which offer additional data structures.

  • To get a thorough understanding of data structures, consult resources such as CLRS.

Solution 5

.NET data structures:

More to conversation about why ArrayList and List are actually different

Arrays

As one user states, Arrays are the "old school" collection (yes, arrays are considered a collection though not part of System.Collections). But, what is "old school" about arrays in comparison to other collections, i.e the ones you have listed in your title (here, ArrayList and List(Of T))? Let's start with the basics by looking at Arrays.

To start, Arrays in Microsoft .NET are, "mechanisms that allow you to treat several [logically-related] items as a single collection," (see linked article). What does that mean? Arrays store individual members (elements) sequentially, one after the other in memory with a starting address. By using the array, we can easily access the sequentially stored elements beginning at that address.

Beyond that and contrary to programming 101 common conceptions, Arrays really can be quite complex:

Arrays can be single dimension, multidimensional, or jadded (jagged arrays are worth reading about). Arrays themselves are not dynamic: once initialized, an array of n size reserves enough space to hold n number of objects. The number of elements in the array cannot grow or shrink. Dim _array As Int32() = New Int32(100) reserves enough space on the memory block for the array to contain 100 Int32 primitive type objects (in this case, the array is initialized to contain 0s). The address of this block is returned to _array.

According to the article, Common Language Specification (CLS) requires that all arrays be zero-based. Arrays in .NET support non-zero-based arrays; however, this is less common. As a result of the "common-ness" of zero-based arrays, Microsoft has spent a lot of time optimizing their performance; therefore, single dimension, zero-based (SZs) arrays are "special" - and really the best implementation of an array (as opposed to multidimensional, etc.) - because SZs have specific intermediary language instructions for manipulating them.

Arrays are always passed by reference (as a memory address) - an important piece of the Array puzzle to know. While they do bounds checking (will throw an error), bounds checking can also be disabled on arrays.

Again, the biggest hindrance to arrays is that they are not re-sizable. They have a "fixed" capacity. Introducing ArrayList and List(Of T) to our history:

ArrayList - non-generic list

The ArrayList (along with List(Of T) - though there are some critical differences, here, explained later) - is perhaps best thought of as the next addition to collections (in the broad sense). ArrayList inherit from the IList (a descendant of 'ICollection') interface. ArrayLists, themselves, are bulkier - requiring more overhead - than Lists.

IList does enable the implementation to treat ArrayLists as fixed-sized lists (like Arrays); however, beyond the additional functionallity added by ArrayLists, there are no real advantages to using ArrayLists that are fixed size as ArrayLists (over Arrays) in this case are markedly slower.

From my reading, ArrayLists cannot be jagged: "Using multidimensional arrays as elements... is not supported". Again, another nail in the coffin of ArrayLists. ArrayLists are also not "typed" - meaning that, underneath everything, an ArrayList is simply a dynamic Array of Objects: Object[]. This requires a lot of boxing (implicit) and unboxing (explicit) when implementing ArrayLists, again adding to their overhead.

Unsubstantiated thought: I think I remember either reading or having heard from one of my professors that ArrayLists are sort of the bastard conceptual child of the attempt to move from Arrays to List-type Collections, i.e. while once having been a great improvement to Arrays, they are no longer the best option as further development has been done with respect to collections

List(Of T): What ArrayList became (and hoped to be)

The difference in memory usage is significant enough to where a List(Of Int32) consumed 56% less memory than an ArrayList containing the same primitive type (8 MB vs. 19 MB in the above gentleman's linked demonstration: again, linked here) - though this is a result compounded by the 64-bit machine. This difference really demonstrates two things: first (1), a boxed Int32-type "object" (ArrayList) is much bigger than a pure Int32 primitive type (List); second (2), the difference is exponential as a result of the inner-workings of a 64-bit machine.

So, what's the difference and what is a List(Of T)? MSDN defines a List(Of T) as, "... a strongly typed list of objects that can be accessed by index." The importance here is the "strongly typed" bit: a List(Of T) 'recognizes' types and stores the objects as their type. So, an Int32 is stored as an Int32 and not an Object type. This eliminates the issues caused by boxing and unboxing.

MSDN specifies this difference only comes into play when storing primitive types and not reference types. Too, the difference really occurs on a large scale: over 500 elements. What's more interesting is that the MSDN documentation reads, "It is to your advantage to use the type-specific implementation of the List(Of T) class instead of using the ArrayList class...."

Essentially, List(Of T) is ArrayList, but better. It is the "generic equivalent" of ArrayList. Like ArrayList, it is not guaranteed to be sorted until sorted (go figure). List(Of T) also has some added functionality.

Share:
167,967

Related videos on Youtube

jlemley
Author by

jlemley

Pretzel is a bit Twisted and sometimes a little Salty; He always goes great with beer! Quote of the Moment: Physicists stand on each others’ shoulders while computer scientists stand on each others' toes. (Sad, but often true.)

Updated on January 07, 2022

Comments

  • jlemley
    jlemley over 2 years

    .NET has a lot of complex data structures. Unfortunately, some of them are quite similar and I'm not always sure when to use one and when to use another. Most of my C# and VB books talk about them to a certain extent, but they never really go into any real detail.

    What's the difference between Array, ArrayList, List, Hashtable, Dictionary, SortedList, and SortedDictionary?

    Which ones are enumerable (IList -- can do 'foreach' loops)? Which ones use key/value pairs (IDict)?

    What about memory footprint? Insertion speed? Retrieval speed?

    Are there any other data structures worth mentioning?

    I'm still searching for more details on memory usage and speed (Big-O notation)

    • Admin
      Admin almost 16 years
      You should break this question apart. You're asking twenty different things, half of which a simple google search can answer. Please be more specific; its hard to help when your question is so scattered.
    • jlemley
      jlemley over 15 years
      I thought about breaking it up, but realized that someone would likely be able to consolidate all these answers into one place. In fact, if someone can come up with a table profiling everything, it might become a wonderful resource on this site.
    • BozoJoe
      BozoJoe over 13 years
      Can this question be turned into a wiki?
    • htm11h
      htm11h about 7 years
      Ryan, the articles at that link are 14 years old, (12 at the time of post). Side note I have been reading them for the last week myself. but they also do not include newer technology and desperately need updating. And more performance metrics and examples.
    • Kurkula
      Kurkula almost 7 years
      Any place for LinkedList in your question? Just asking.
    • Hogan
      Hogan over 3 years
      @Dracolyte I believe this question is what you wanted to ask and it was deleted.
  • Ilya Ryzhenkov
    Ilya Ryzhenkov almost 16 years
    There is no such thing as "performance". The complexity depends on operation. For example, if you insert n elements into Dictionary<>, it will not be O(1) due to rehashing.
  • Justin Bozonier
    Justin Bozonier over 15 years
    Hash Table is O(1), worst case (with collisions) can be O(n)
  • supercat
    supercat over 13 years
    FYI, even with rehashing, Dictionary is still O(1). Consider the scenario just before the Dictionary expands. Half the elements--those that were added since the last expansion--will have been hashed once. Half of the remainder will have been hashed twice. Half of the remainder from that, three times, etc. The average number of hashing operations performed on each element will be 1+1/2+1/4+1/8...=2. The situation immediately after expansion is essentially the same, but with every element having been hashed one extra time (so average hash count is three). All other scenarios are between those.
  • DarthVader
    DarthVader almost 13 years
    There are many other data structures you need to add here. like LinkedList, Skip List, Stack, Queue, Heap, Trees, Graphs. These are very important data structures as well.
  • Bryan Menard
    Bryan Menard almost 13 years
    This is partly true. The Hashtable is safe to use with only one writer and multiple readers concurrently. On the other hand, it is safe to use the Dictionary with multiple readers as long as it is not modified concurrently.
  • Rob
    Rob almost 13 years
    Definitely. In the trading space however, we're concurrently reading from live market data and running analytics that include the appended entries. It also depends on how many traders are utilizing the system - if it's just you, it obviously doesn't matter.
  • Rob
    Rob almost 13 years
    .NET 4.0 provides a ConcurrentDictionary<TKey, TValue>
  • Harindaka
    Harindaka over 11 years
    ConcurrentDictionary added in .Net 4.0 provides a generic dictionary with Thread Safety
  • Harindaka
    Harindaka over 11 years
    Also BlockingCollection<T> provides a thread safe producer/consumer implementation
  • Sam Harwell
    Sam Harwell about 11 years
    ArrayList uses virtual methods, but List<T> does not. ArrayList has been largely replaced with List<T> for standard collections and Collection<T> as a base class for custom collections. Hashtable has been largely replaced by Dictionary<TKey, TValue>. I would recommend avoiding ArrayList and Hashtable for new code.
  • Hogan
    Hogan about 11 years
    @280Z28 - You really should write a new answer -- Sam's was great but it is 5 years old at this point.
  • Haim Bendanan
    Haim Bendanan over 7 years
    from msdn, it seems like sortedList implement IDictionnary - not IList
  • vorillaz
    vorillaz over 7 years
    Fixed. thanks for the comment. Seems like SortedList keeps a list of key/values, so it basically represents a dictionary's data. Don't remember how this class worked when I first wrote the answer ...
  • RBT
    RBT over 7 years
    Efficiency of hash tables is amortized O(1), not O(1). More details here