What is more efficient: Dictionary TryGetValue or ContainsKey+Item?

229,921

Solution 1

TryGetValue will be faster.

ContainsKey uses the same check as TryGetValue, which internally refers to the actual entry location. The Item property actually has nearly identical code functionality as TryGetValue, except that it will throw an exception instead of returning false.

Using ContainsKey followed by the Item basically duplicates the lookup functionality, which is the bulk of the computation in this case.

Solution 2

A quick benchmark shows that TryGetValue has a slight edge:

    static void Main() {
        var d = new Dictionary<string, string> {{"a", "b"}};
        var start = DateTime.Now;
        for (int i = 0; i != 10000000; i++) {
            string x;
            if (!d.TryGetValue("a", out x)) throw new ApplicationException("Oops");
            if (d.TryGetValue("b", out x)) throw new ApplicationException("Oops");
        }
        Console.WriteLine(DateTime.Now-start);
        start = DateTime.Now;
        for (int i = 0; i != 10000000; i++) {
            string x;
            if (d.ContainsKey("a")) {
                x = d["a"];
            } else {
                x = default(string);
            }
            if (d.ContainsKey("b")) {
                x = d["b"];
            } else {
                x = default(string);
            }
        }
   }

This produces

00:00:00.7600000
00:00:01.0610000

making the ContainsKey + Item access about 40% slower assuming an even blend of hits and misses.

Moreover, when I change the program to always miss (i.e. always looking up "b") the two versions become equally fast:

00:00:00.2850000
00:00:00.2720000

When I make it "all hits", however, the TryGetValue remains a clear winner:

00:00:00.4930000
00:00:00.8110000

Solution 3

Since none of the answers thus far actually answer the question, here is an acceptable answer I found after some research:

If you decompile TryGetValue you see that it’s doing this:

public bool TryGetValue(TKey key, out TValue value)
{
  int index = this.FindEntry(key);
  if (index >= 0)
  {
    value = this.entries[index].value;
    return true;
  }
  value = default(TValue);
  return false;
}

whereas the ContainsKey method is:

public bool ContainsKey(TKey key)
{
  return (this.FindEntry(key) >= 0);
}

so TryGetValue is just ContainsKey plus an array lookup if the item is present.

Source

It appears that TryGetValue will be almost twice as fast as ContainsKey+Item combination.

Solution 4

Who cares :-)

You're probably asking because TryGetValue is a pain to use - so encapsulate it like this with an extension method.

public static class CollectionUtils
{
    // my original method
    // public static V GetValueOrDefault<K, V>(this Dictionary<K, V> dic, K key)
    // {
    //    V ret;
    //    bool found = dic.TryGetValue(key, out ret);
    //    if (found)
    //    {
    //        return ret;
    //    }
    //    return default(V);
    // }


    // EDIT: one of many possible improved versions
    public static TValue GetValueOrDefault<K, V>(this IDictionary<K, V> dictionary, K key)
    {
        // initialized to default value (such as 0 or null depending upon type of TValue)
        TValue value;  

        // attempt to get the value of the key from the dictionary
        dictionary.TryGetValue(key, out value);
        return value;
    }

Then just call :

dict.GetValueOrDefault("keyname")

or

(dict.GetValueOrDefault("keyname") ?? fallbackValue) 

Solution 5

Why don't you test it?

But I'm pretty sure that TryGetValue is faster, because it only does one lookup. Of course this isn't guaranteed, i.e. different implementations might have different performance characteristics.

The way I'd implement a dictionary is by creating an internal Find function that finds the slot for an item, and then build the rest on top of that.

Share:
229,921

Related videos on Youtube

Peter Hlavatík
Author by

Peter Hlavatík

B.Sc. (Computer Science) M.Math (Artificial Intelligence) Professional working knowledge of C++, C#, Ruby, Perl, .NET, Oracle, SQL Server, MySQL, PHP, JavaScript, CSS Academic knowledge of Java, C, Assembler, Scala, F#

Updated on October 12, 2021

Comments

  • Peter Hlavatík
    Peter Hlavatík over 2 years

    From MSDN's entry on Dictionary.TryGetValue Method:

    This method combines the functionality of the ContainsKey method and the Item property.

    If the key is not found, then the value parameter gets the appropriate default value for the value type TValue; for example, 0 (zero) for integer types, false for Boolean types, and null for reference types.

    Use the TryGetValue method if your code frequently attempts to access keys that are not in the dictionary. Using this method is more efficient than catching the KeyNotFoundException thrown by the Item property.

    This method approaches an O(1) operation.

    From the description, it's not clear if it is more efficient or just more convenient than calling ContainsKey and then doing the lookup. Does the implementation of TryGetValue just call ContainsKey and then Item or is actually more efficient than that by doing a single lookup?

    In other words, what is more efficient (i.e. which one performs less lookups):

    Dictionary<int,int> dict;
    //...//
    int ival;
    if(dict.ContainsKey(ikey))
    {
      ival = dict[ikey];
    }
    else
    {
      ival = default(int);
    }
    

    or

    Dictionary<int,int> dict;
    //...//
    int ival;
    dict.TryGetValue(ikey, out ival);
    

    Note: I am not looking for a benchmark!

  • Luciano
    Luciano over 11 years
    I added a test case using Linq (Any()) instead of ContainsKey() and it ran about 3 times slower. So, between TryGetValue, ConstainsKey, and Linq Any(), stay away from "Linq Any()".
  • weston
    weston over 11 years
    @Luciano explain how you used Any - Like this: Any(i=>i.Key==key). In which case, yes, that's a bad linear search of the dictionary.
  • Luciano
    Luciano over 11 years
    @weston: Yep ! Any(i => i.Key == key)
  • Alastair Maw
    Alastair Maw over 11 years
    DateTime.Now will only be accurate to a few ms. Use the Stopwatch class in System.Diagnostics instead (which uses QueryPerformanceCounter under the covers to provide much higher accuracy). It's easier to use, too.
  • antiduh
    antiduh over 10 years
    In addition to Alastair and Ed's comments - DateTime.Now can go backwards, if you get a time update, such as that which occurs when the user updates their computer time, a time zone is crossed, or the time zone changes (DST, for instance). Try working on a system that has the system clock synced to time provided by some radio service like GPS or mobile phone networks. DateTime.Now will go all over the place, and DateTime.UtcNow only fixes one of those causes. Just use StopWatch.
  • Tim Schmelter
    Tim Schmelter over 8 years
    This is more subtle: if(dict.ContainsKey(ikey)) dict[ikey]++; else dict.Add(ikey, 0);. But i think that TryGetValue is still more efficient since the get and set of the indexer property is used, isn't it?
  • Reed Copsey
    Reed Copsey over 8 years
    @TimSchmelter Yes - The ContainsKey and the accessor both do the checks - if you used TryGetValue, you'd only be doing the get work once, effectively, instead of twice.
  • John Gardner
    John Gardner almost 8 years
    you can actually look at the .net source for it now too: referencesource.microsoft.com/#mscorlib/system/collections/… you can see that all 3 of TryGetValue, ContainsKey, and this[] call the same FindEntry method and do the same amount of work, only differing in how they answer the question: trygetvalue returns bool and the value, contains key only returns true/false, and this[] returns the value or throws an exception.
  • Reed Copsey
    Reed Copsey almost 8 years
    @JohnGardner Yes, which is what I said - but if you do ContainsKey then get Item, you're doing that work 2x instead of 1x.
  • John Gardner
    John Gardner almost 8 years
    i agree completely :) i was just pointing out that the actual source is available now. none of the other answers/etc had a link to the actual source :D
  • aruno
    aruno almost 8 years
    @Hüseyin I got very confused how I was stupid enough to post this without this but it turns out I have my method duplicated twice in my code base - once with and one without the this so that's why I never caught it! thanks for fixing!
  • Raphael Smit
    Raphael Smit over 7 years
    TryGetValue assigns a default value to the out value parameter if the key doesnt exist, so this could be simplified.
  • damian
    damian over 7 years
    I would not expect a test on a dictionary with only 2 entries to match real-world conditions :/
  • Joshua
    Joshua over 7 years
    Simplified version: public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dict, TKey key) { TValue ret; dict.TryGetValue(key, out ret); return ret; }
  • Chris Berry
    Chris Berry about 7 years
    Slightly off topic, if you're accessing via an IDictionary in a multithreaded environment I would always use TryGetValue as the state may change from the time you call ContainsKey (there's no guarantee that TryGetValue will internally lock correctly either, but it's probably safer)
  • Shimmy Weitzhandler
    Shimmy Weitzhandler almost 7 years
    In C#7 this is really fun: if(!dic.TryGetValue(key, out value item)) item = dic[key] = new Item();
  • Dan Bechard
    Dan Bechard almost 7 years
    This is not a useful test. You need to add more than a single key/value pair to the dictionary before running your benchmark (otherwise O(1) and O(n) are equal and your results are off by a factor of n). Try generating a few hundred or few thousand pairs and running this again.
  • Sergey Kalinichenko
    Sergey Kalinichenko almost 7 years
    @Dan Both operations I am comparing are required to be O(1), this is not the point of the benchmark.
  • Dan Bechard
    Dan Bechard almost 7 years
    I don't think the implementation details can possibly change the guarantee that doing action X once is faster than or equal to doing action X twice. Best case they're identical, worse case the 2X version takes twice as long.
  • Dan Bechard
    Dan Bechard almost 7 years
    While I 100% agree with your argument about semantics, it is still worth doing the performance test. You never know when the API you're using has a suboptimal implementation such that the semantically correct thing happens to be slower, unless you do the test.
  • Dan Bechard
    Dan Bechard almost 7 years
    I feel like something fishy is going on here. I wonder if the optimizer might be removing or simplifying your ContainsKey() checks due to the fact that you never use the retrieved value.
  • Dan Bechard
    Dan Bechard almost 7 years
    @dasblinkenlight You can't test a single iteration of something timed in milliseconds and expect to get useful results. There's a reason every other benchmark answer on this page iterates over the operation a bunch of times to get realistic averages. Also, as Ed S. pointed out, DateTime is not a good sub-second timer for performance testing. Use the built-in performance timer: System.Diagnostics.StopWatch msdn.microsoft.com/en-us/library/…
  • Sergey Kalinichenko
    Sergey Kalinichenko almost 7 years
    @Dan My benchmark also iterates over the operation ten million times to get realistic results. Moreover, my results are very much in line with what everyone else is getting: for example, 45/26 ratio of davisoa is within 5% of my 0.811/0.493 ratio.
  • Dan Bechard
    Dan Bechard almost 7 years
    @dasblinkenlight Ah, I was admittedly slightly confused by the "Moreover, when I change the program..." part that isn't shown in the code. It helps if your code snippet matches your sample output (I'm assuming you just commented "a" and "b" parts for the other tests?). Nevertheless, a million iterations over a Dictionary with one element still seems kind of silly, but if you're sure that's what you were testing; so be it!
  • AxD
    AxD almost 7 years
    It just can't. ContainsKey() is in a compiled DLL. The optimizer doesn't know anything about what ContainsKey() actually does. It might cause side effects, so it has to be called and cannot be abridged.
  • Jaider
    Jaider over 6 years
    This could confuse even more people at your company, now they will have 3 options TryGetValue, ContainsKey, or GetValueOrDefault
  • aruno
    aruno over 6 years
    @Jaider it's just me :) plus i just cannot stand TryGetValue. Its clumsiness is just obtuse - admittedly made marginally better by the new ability to define an out parameter in place.
  • Jim Balter
    Jim Balter over 6 years
    Something is bogus here. The fact is that examining the .NET code shows that ContainsKey, TryGetValue, and this[] all call the same internal code, so TryGetValue is faster than ContainsKey + this[] when the entry exists.
  • Jim Balter
    Jim Balter over 6 years
    "a million iterations over a Dictionary with one element still seems kind of silly" -- only to the completely clueless. This is not a benchmark of the hashtable implementation. If that implementation happens to be particularly bad, then adding a bunch of entries to the table may unduly disadvantage ContainsKey+d[]. By using the simplest case, the benchmark gets at what it wants to measure and not what it doesn't.
  • Lucas
    Lucas about 5 years
    Note that users of the extension method can't tell the difference between a non-existent key and a key that exists but its value is default(T).
  • Lucas
    Lucas about 5 years
    Note that users of this extension method can't tell the difference between a non-existent key and a key that exists but its value is the default value.
  • Deven T. Corzine
    Deven T. Corzine over 4 years
    Ironically, the real source code already HAS a GetValueOrDefault() routine, but it's hidden... referencesource.microsoft.com/#mscorlib/system/collections/…
  • ToolmakerSteve
    ToolmakerSteve about 4 years
    UPDATE for C# 7: TryGetValue is now less burdensome, because the out param can be inlined: if (dict.TryGetValue(key, out TValue value) ... else .... For situations where you want special code on either of the if branches, this may be (slightly) more concise than using this answer's GetValueOrDefault.
  • aruno
    aruno about 4 years
    @ToolmakerSteve yes definitely when you have an of statement that is better. About 9 times out of 10 that I need to get a value I want to use it immediately.
  • debater
    debater about 4 years
    On a modern computer, if you call a subroutine twice in quick succession, it is unlikely to take twice as long as calling it once. This is because the CPU and caching architecture is very likely to cache a lot of the instructions and data associated with the first call, so the second call will be executed faster. On the other hand, calling twice is almost certain to take a bit longer than calling once, so there is still an advantage to eliminating the second call if possible.