Efficiency of using IEqualityComparer in Dictionary vs HashCode and Equals()

18,200

Solution 1

Is it Faster?

Coming from a gamedev perspective, if your key is a value type (struct, primitive, enum, etc.) providing your own EqualityComparer<T> is significantly faster - due to the fact the EqualityComparer<T>.Default boxes the value.

As a real-world example, the Managed DirectX billboard sample used to run at ~30% of the speed of the C++ version; where all the other samples were running at ~90%. The reason for this was that the billboards were being sorted using the default comparer (and thus being boxed), as it turns out 4MB of data was being copied around every frame thanks to this.

How does it work?

Dictionary<K,V> will provide EqualityComparer<T>.Default to itself via the default constructor. What the default equality comparer does is (basically, notice how much boxing occurs):

public void GetHashCode(T value)
{
   return ((object)value).GetHashCode();
}

public void Equals(T first, T second)
{
   return ((object)first).Equals((object)second);
}

Why would I ever use it?

It's quite common to see this kind of code (when trying to have case-insensitive keys):

var dict = new Dictionary<string, int>();
dict.Add(myParam.ToUpperInvariant(), fooParam);
// ...
var val = dict[myParam.ToUpperInvariant()];

This is really wasteful, it is better to just use a StringComparer on the constructor:

var dict = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

Is it faster (redux)?

In this specific scenario it is a lot faster, because ordinal string comparisons are the fastest type of string comparison you can do. A quick benchmark:

static void Main(string[] args)
{
    var d1 = new Dictionary<string, int>();
    var d2 = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

    d1.Add("FOO", 1);
    d2.Add("FOO", 1);

    Stopwatch s = new Stopwatch();
    s.Start();
    RunTest1(d1, "foo");
    s.Stop();
    Console.WriteLine("ToUpperInvariant: {0}", s.Elapsed);

    s.Reset();
    s.Start();
    RunTest2(d2, "foo");
    s.Stop();
    Console.WriteLine("OrdinalIgnoreCase: {0}", s.Elapsed);

    Console.ReadLine();
}

static void RunTest1(Dictionary<string, int> values, string val)
{
    for (var i = 0; i < 10000000; i++)
    {
        values[val.ToUpperInvariant()] = values[val.ToUpperInvariant()];
    }
}

static void RunTest2(Dictionary<string, int> values, string val)
{
    for (var i = 0; i < 10000000; i++)
    {
        values[val] = values[val];
    }
}

// ToUpperInvariant: 00:00:04.5084119
// OrdinalIgnoreCase: 00:00:02.1211549
// 2x faster.

Reservations

It is possible to eliminate the boxing overhead by implementing an interface on a struct (such as IEquatable<T>). However, there are many surprising rules for when boxing occurs under these circumstances so I would recommend using the paired interface (e.g. IEqualityComparer<T> in this case) if at all possible.

Solution 2

Jonathan has a great answer that points out how, using the right equality comparer improves the performance and Jon clarifies in his great answer that Dictionary<K, V> always uses an IEqualityComparer<T> which is EqualityComparer<T>.Default unless you specify another.

The thing I'd like to touch upon is the role of IEquatable<T> interface when you use the default equality comparer.

When you call the EqualityComparer<T>.Default, it uses a cached comparer if there is one. If it's the first time you're using the default equality comparer for that type, it calls a method called CreateComparer and caches the result for later use. Here is the trimmed and simplified implementation of CreateComparer in .NET 4.5:

var t = (RuntimeType)typeof(T);

// If T is byte,
// return a ByteEqualityComparer.

// If T implements IEquatable<T>,
if (typeof(IEquatable<T>).IsAssignableFrom(t))
    return (EqualityComparer<T>)
           RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
               (RuntimeType)typeof(GenericEqualityComparer<int>), t);

// If T is a Nullable<U> where U implements IEquatable<U>,
// return a NullableEqualityComparer<U>

// If T is an int-based Enum,
// return an EnumEqualityComparer<T>

// Otherwise return an ObjectEqualityComparer<T>

But what does it mean for types that implement IEquatable<T>?
Here, the definition of GenericEqualityComparer<T>:

internal class GenericEqualityComparer<T> : EqualityComparer<T>
    where T: IEquatable<T>
// ...

The magic happens in the generic type constraint (where T : IEquatable<T> part) because using it does not involve boxing if T is a value type, no casting like (IEquatable<T>)T is happening here, which is the primary benefit of generics.

So, let's say we want a dictionary that maps integers to strings.
What happens if we initialize one using the default constructor?

var dict = new Dictionary<int, string>();
  • We know that a dictionary uses EqualityComparer<T>.Default unless we specify another.
  • We know that EqualityComparer<int>.Default will check if int implements IEquatable<int>.
  • We know that int (Int32) implements IEquatable<Int32>.

First call to EqualityComparer<T>.Default will create and cache a generic comparer which may take a little but when initialized, it's a strongly typed GenericEqualityComparer<T> and using it will cause no boxing or unnecessary overhead whatsoever.

And all the subsequent calls to EqualityComparer<T>.Default will return the cached comparer, which means the overhead of initialization is one-time only for each type.


So what does it all mean?

  • Do implement a custom equality comparer if T does not implement IEquatable<T> or its implementation of IEquatable<T> does not do what you want it to do.
    (i.e. obj1.Equals(obj2) doesn`t give you the desired result.)

Using of StringComparer in Jonathan's answer is a great example why you would specify a custom equality comparer.

  • Do not implement a custom equality comparer for the sake of performance if T implements IEquatable<T> and the implementation of IEquatable<T> does what you want it to do.
    (i.e. obj1.Equals(obj2) gives you the desired result).

In the latter case, use EqualityComparer<T>.Default instead.

Solution 3

Dictionary<,> always uses an IEqualityComparer<TKey> - if you don't pass one, it uses EqualityComparer<T>.Default. So the efficiency will depend on how efficient your implementation is compared with EqualityComparer<T>.Default (which just delegates to Equals and GetHashCode).

Share:
18,200
Mikey S.
Author by

Mikey S.

Ruby on Rails Developer @ GetTaxi

Updated on June 06, 2022

Comments

  • Mikey S.
    Mikey S. almost 2 years

    The title is pretty much clear I think.

    I was wondering if there's a certain efficiency overhead when using IEqualityComparer in a Dictionary<K,V> how does it all work when providing one?

    Thanks

  • Jon Skeet
    Jon Skeet over 12 years
    @jtbandes: If you see this, could you stop changing my posts? I prefer to leave everything in ASCII...
  • csiu
    csiu over 12 years
    Whoop, sure. Can you consider using "--" then? It's more readable, at least in my opinion :)
  • Şafak Gür
    Şafak Gür about 11 years
    Great answer but I think you should have mentioned EqualityComparer<T>.Default first checks if the type implements IEquatable<T> and if so, uses the implementation; which means you don't have to supply a custom comparer just to avoid boxing if your value type implements IEquatable<T> interface.
  • Jonathan Dickinson
    Jonathan Dickinson about 11 years
    @ŞafakGür using interfaces to access value types will box them: stackoverflow.com/questions/7995606/…
  • Şafak Gür
    Şafak Gür over 10 years
    Calling Foo(IEquatable<int> obj) using an int will box it, but calling Bar<T>(T obj) where T : IEquatable<int> won't. This is pretty much why generics exist.
  • Jonathan Dickinson
    Jonathan Dickinson about 10 years
    @ŞafakGür nope. Compile it and check the IL - accessing struct methods via interfaces will box - the reason is that structs lack an ITable+VTable. This is why IEqualityComparer exists. I saved you the trouble: gist.github.com/jcdickinson/9051956#file-program-bar-il-L16 (IL0003 is a box instruction)
  • Şafak Gür
    Şafak Gür about 10 years
    It boxes the struct because your sample calls Object.Equals (see IL000E), not IEquatable<T>.Equals. How can compiler know if obj1 or obj2 is a MyEquatableThing with a signature like this: bool Bar<T>(T obj1, T obj2) where T : IEquatable<MyEquatableThing>? Change your generic type constraint to where T : IEquatable<T> and check the IL again, there is no box instruction.
  • Jonathan Dickinson
    Jonathan Dickinson about 10 years
    @ŞafakGür thanks, that got rid of the box instruction (updated the gist). I'll update the answer with a more correct conclusion when I get home.