A faster replacement to the Dictionary<TKey, TValue>

38,319

Solution 1

Chances are you're seeing JIT compilation. On my box, I see:

00:00:00.0000360
00:00:00.0000060

when I run it twice in quick succession within the same process - and not in the debugger. (Make sure you're not running it in the debugger, or it's a pointless test.)

Now, measuring any time that tiny is generally a bad idea. You'd need to iterate millions of times to get a better idea of how long it's taking.

Do you have good reason to believe it's actually slowing down your code - or are you basing it all on your original timing?

I doubt that you'll find anything significantly faster than Dictionary<TKey, TValue> and I'd be very surprised to find that it's the bottleneck.

EDIT: I've just benchmarked adding a million elements to a Dictionary<TKey, TValue> where all the keys were existing objects (strings in an array), reusing the same value (as it's irrelevant) and specifying a capacity of a million on construction - and it took about 0.15s on my two-year-old laptop.

Is that really likely to be a bottleneck for you, given that you've already said you're using some "old slow libraries" elsewhere in your app? Bear in mind that the slower those other libraries are, the less impact an improved collection class will have. If the dictionary changes are only accounting for 1% of your overall application time, then even if we could provide an instantaneous dictionary, you'd only speed up your app by 1%.

As ever, get a profiler - it'll give you a much better idea of where your time is going.

Solution 2

I agree with Jon Skeet's supposition that this is most likely JIT compilation.

That being said, I wanted to add some other information here:

Most of the speed issues relating to using Dictionary<T,U> are not related to the implementation of Dictionary. Dictionary<T,U> is VERY fast, out of the box. It would be difficult to beat it.

Speed issues relating to Dictionary instances are almost always actually hash code implementation issues. If you're having speed issues when using Dictionary<MyCustomClass,MyValue>, revisit the GetHashCode() implementation you have defined on MyCustomClass. This is even more critical if you're using a custom struct as your key.

In order to get good performance out of Dictionary, GetHashCode() should be:

  1. Fast
  2. Able to provide hash codes that generate few conflicts. Unique instances should, when possible, generate unique hash values.

If you get that right, I think you'll be very happy with the default Dictionary implementation.

Solution 3

Don't forget, you're timing the Dictionary constructor in that code as well. I did a test, moving the call to the constructor out of the measurement, and looped 10 times. Here's my test code:

for (int i = 0; i < 10; i++)
{
    Dictionary<string, string> test = new Dictionary<string, string>();

    System.Diagnostics.Stopwatch watch = System.Diagnostics.Stopwatch.StartNew();

    test.Add("fieldName", "fieldValue");
    test.Add("Title", "fieldavlkajlkdjflkjalkjslkdjfiajwelkrjelrkjavoijl");

    Console.WriteLine(watch.Elapsed);
}

Console.ReadKey();

Below are the results:

00:00:00.0000607
00:00:00.0000025
00:00:00.0000015
00:00:00.0000015
00:00:00.0000016
00:00:00.0000017
00:00:00.0000016
00:00:00.0000016
00:00:00.0000016
00:00:00.0000015

I'm not sure how much faster you could get than that...

Update

Looks like this mirrors Jon Skeets results too...JIT.

Solution 4

If you really need better performance, you're going to have to give up something major - like generics, dynamic memory allocation, etc. All those features sacrifice some performance.

I would avoid using Contains if at all possible and look at TryGetValue etc.

Solution 5

USE INTS AS KEYS FOR MAXIMUM PERFORMANCE:

For anyone who came here from Google, if you want to squeeze every last bit of performance out of a Dictionary, then use Ints as keys. Here's a benchmark comparing Int vs String Keys: https://jacksondunstan.com/articles/2527

The author of the article even mentions that converting strings to ints is worthwhile if you have such a need.

Also, note that this same behavior occurs in some other languages like PHP. Php associative arrays -are in fact- dictionaries, and if you use Ints in ascending order in PHP7, they outperform string keys tremendously.

Share:
38,319

Related videos on Youtube

Alon Gubkin
Author by

Alon Gubkin

Updated on August 02, 2020

Comments

  • Alon Gubkin
    Alon Gubkin over 3 years

    I need a fast replacement for the System.Collections.Generic.Dictionary<TKey, TValue>. My application should be really fast. So, the replacement should support:

    • Generics
    • Add
    • Get
    • Contains

    ... and that's it. I don't need any support in LINQ or anything. And it should be fast.

    A simple code like:

    Stopwatch stopWatch = Stopwatch.StartNew();
    
    Dictionary<string, string> dictionary = new Dictionary<string, string>();
    dictionary.Add("fieldName", "fieldValue");
    dictionary.Add("Title", "fieldVaaaaaaaaaaaaaaaaalue");
    
    Console.WriteLine(stopWatch.Elapsed);
    

    ... prints 00:00:00.0001274, which is alot of time for me, because my application is doing many other things, some of them from old slow libraries that I must to use and are not dependent on me.

    Any ideas on how to implement a faster one?

    Thank you.

    • AnthonyWJones
      AnthonyWJones over 14 years
      How frequently will be creating such a dictionary? Is there a reason you included the construction of the dictionary in your timing?
    • Erik Funkenbusch
      Erik Funkenbusch over 14 years
      Did you measure the time in a release build, not run under the debugger?
    • Ed S.
      Ed S. over 14 years
      Define "fast". Have you actually profiled any real code or is this just some contrived example?
    • Oliver Friedrich
      Oliver Friedrich over 14 years
      if you want it really fast, do not use Strings as keys - it is the number one performance-killer.
    • Nick Vaccaro
      Nick Vaccaro over 14 years
      +1 to the killer of Grendel. If you can use an enum instead, then go for it.
    • Walter Verhoeven
      Walter Verhoeven almost 5 years
      Provide your own compare, var dictionary = new Dictionary<string, string>(StringComparer.Ordinal); this way you improve over the default taken by the dictionary and you avoid a double cast in the Dictionary's TKey convertion
  • Alon Gubkin
    Alon Gubkin over 14 years
    I'm basing it all on my orginal timing.
  • Reed Copsey
    Reed Copsey over 14 years
    Dictionary can perform very poorly with custom classes, or even more likely, custom structs, as the key if the hash code implementation is poor.
  • Saar
    Saar over 14 years
    @Jon: I am executing same application in Visual Studio with Ctrl + F5. The lowest value I could get is ~ 00:00:00.0001552. Looks very big compare to yours one. Can you please bit elaborate in detail how to test. Thanks in advance. and sorry to bother you.
  • R. Martinho Fernandes
    R. Martinho Fernandes over 14 years
    @Saar: perhaps his machine is faster, or he is running less stuff, or a kabazillion other possibilities. And the time to run 2 Adds will fluctuate a lot. You have to run a kabazillion Adds to get some stable measure.
  • Jon Skeet
    Jon Skeet over 14 years
    @Saar: Run the same code twice in a row... what machine have you got, and what version of .NET are you using?
  • Saar
    Saar over 14 years
    @Jon: I executed in row 5-6 times, by hitting Ctrl + F5. My machine is Quad Core 9550 vista .net framework 3.5
  • Jon Skeet
    Jon Skeet over 14 years
    @Saar: No, run it in the same process more than once, i.e. put it in a method and call it twice. Otherwise you'll be JITting it each time.
  • sweetfa
    sweetfa over 10 years
    If you cannot have unique hash code values then the performance of your Equals method in your key class is also important