Performance of Find() vs. FirstOrDefault()

92,880

Solution 1

I was able to mimic your results so I decompiled your program and there is a difference between Find and FirstOrDefault.

First off here is the decompiled program. I made your data object an anonmyous data item just for compilation

    List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
    {
      var local_0 = new
      {
        Name = Guid.NewGuid().ToString()
      };
      return local_0;
    }));
    source.Insert(999000, new
    {
      Name = diana
    });
    stopwatch.Restart();
    Enumerable.FirstOrDefault(source, c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
    stopwatch.Restart();
    source.Find(c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);

The key thing to notice here is that FirstOrDefault is called on Enumerable whereas Find is called as a method on the source list.

So, what is find doing? This is the decompiled Find method

private T[] _items;

[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
  if (match == null)
    ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
  for (int index = 0; index < this._size; ++index)
  {
    if (match(this._items[index]))
      return this._items[index];
  }
  return default (T);
}

So it's iterating over an array of items which makes sense, since a list is a wrapper on an array.

However, FirstOrDefault, on the Enumerable class, uses foreach to iterate the items. This uses an iterator to the list and move next. I think what you are seeing is the overhead of the iterator

[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
  if (source == null)
    throw Error.ArgumentNull("source");
  if (predicate == null)
    throw Error.ArgumentNull("predicate");
  foreach (TSource source1 in source)
  {
    if (predicate(source1))
      return source1;
  }
  return default (TSource);
}

Foreach is just syntatic sugar on using the enumerable pattern. Look at this image

enter image description here.

I clicked on foreach to see what it's doing and you can see dotpeek wants to take me to the enumerator/current/next implementations which makes sense.

Other than that they are basically the same (testing the passed in predicate to see if an item is what you want)

Solution 2

I'm wagering that FirstOrDefault is running via the IEnumerable implementation, that is, it will use a standard foreach loop to do the checking. List<T>.Find() is not part of Linq (http://msdn.microsoft.com/en-us/library/x0b5b5bc.aspx), and is likely using a standard for loop from 0 to Count (or another fast internal mechanism probably operating directly on its internal/wrapped array). By getting rid of the overhead of enumerating through (and doing the version checks to ensure that the list hasn't been modified) the Find method is faster.

If you add a third test:

//3. System.Collections.Generic.List<T> foreach
Func<Customer, bool> dianaCheck = c => c.Name == diana;
watch.Restart();
foreach(var c in customers)
{
    if (dianaCheck(c))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T> foreach.", watch.ElapsedMilliseconds);

That runs about the same speed as the first one (25ms vs 27ms for FirstOrDefault)

EDIT: If I add an array loop, it gets pretty close to the Find() speed, and given @devshorts peek at the source code, I think this is it:

//4. System.Collections.Generic.List<T> for loop
var customersArray = customers.ToArray();
watch.Restart();
int customersCount = customersArray.Length;
for (int i = 0; i < customersCount; i++)
{
    if (dianaCheck(customers[i]))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with an array for loop.", watch.ElapsedMilliseconds);

This runs only 5.5% slower than the Find() method.

So bottom line: looping through array elements is faster than dealing with foreach iteration overhead. (but both have their pros/cons, so just choose what makes sense for your code logically. Furthermore, only rarely will the small difference in speed ever cause an issue, so just use what makes sense for maintainability/readability)

Share:
92,880
Arman
Author by

Arman

Software Engineer

Updated on July 08, 2022

Comments

  • Arman
    Arman almost 2 years

    Similar Question:
    Find() vs. Where().FirstOrDefault()

    Got an interesting outcome searching for Diana within a large sequence of a simple reference type having a single string property.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    public class Customer{
        public string Name {get;set;}
    }
    
    Stopwatch watch = new Stopwatch();        
        const string diana = "Diana";
    
        while (Console.ReadKey().Key != ConsoleKey.Escape)
        {
            //Armour with 1000k++ customers. Wow, should be a product with a great success! :)
            var customers = (from i in Enumerable.Range(0, 1000000)
                             select new Customer
                             {
                                Name = Guid.NewGuid().ToString()
                             }).ToList();
    
            customers.Insert(999000, new Customer { Name = diana }); // Putting Diana at the end :)
    
            //1. System.Linq.Enumerable.DefaultOrFirst()
            watch.Restart();
            customers.FirstOrDefault(c => c.Name == diana);
            watch.Stop();
            Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", watch.ElapsedMilliseconds);
    
            //2. System.Collections.Generic.List<T>.Find()
            watch.Restart();
            customers.Find(c => c.Name == diana);
            watch.Stop();
            Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", watch.ElapsedMilliseconds);
        }
    

    enter image description here

    Is this because of no Enumerator overhead in List.Find() or this plus maybe something else?

    Find() runs almost as twice as faster, hoping .Net team will not mark it Obsolete in the future.

    • Oded
      Oded over 11 years
      Try timing Find() before FirstOrDefault. What are the results then?
    • Arman
      Arman over 11 years
      @Oded did it. Exactly the same. I've also run FirstOrDefault sequentially two times but still the same 23-24 ms (on my iCore5). Looks it's not caching.
    • Daniel
      Daniel over 11 years
      Interesting. Does performance scale linearly with list size (does FirstOrDefault always take twice as long for other list sizes, or is there a fixed 10ms cost in using Linq)?
    • MBen
      MBen over 11 years
      On Mono it's even more : Diana was found in 30 ms with System.Collections.Generic.List<T>.Find(). Diana was found in 176 ms with System.Linq.Enumerable.FirstOrDefault().
    • Arman
      Arman over 11 years
      Yes, @Daniel, it does. It takes 134(FInd) vs 246(1stOrDefault)ms for 10000k items.
    • Jonathan Wood
      Jonathan Wood over 11 years
      Interesting. Here's some more info: stackoverflow.com/questions/9335015/…. Looks like Find() is not available on IEnumerable?
    • Arman
      Arman over 11 years
      Yes, it's available on List<T>, Array etc(C#2) but not Enumerable types that were added on C# 3 with Linq.
    • CodesInChaos
      CodesInChaos over 11 years
      Three indirect calls per item for FirstOrDefault, one indirect call for Find.
  • Arman
    Arman over 11 years
    good comparison with foreach and for. I'm always a fan of maintainability/readability over poor performance boost guys fighting for each and every nanosecond :) But for really large sequences (especially when code runs on servers) it's a common task to look-up for performance optimizations, and having choice of two time faster code makes me decide to stay with List<T>.Find() for heavy operations.
  • Arman
    Arman over 11 years
    It's now 100% obvious what is the only difference between them, I was hoping to see something else, like more tricky to identify. It's always interesting to see what's going on under the hood of .net framework. thanks!
  • Chris Sinclair
    Chris Sinclair over 11 years
    @Arman I mostly agree. In this case, you'd need to have 10 million entries to get a real perceptable performance hit (few hundred milliseconds). Really, at this point you should probably throw out this O(n) iteration for some O(1) lookup like a dictionary keyed to the name.
  • Arman
    Arman over 11 years
    @ChrisSinclar or even a better algorithm O(I'm sorry) :) I'm more surprised with the another comment, when he says it takes 176ms on Mono. And that only with the simplest single property class. What would happen with even 10k real Customers running on a server with 1000 concurrent clients (we deal with similar scenarios pretty often)? That's the cost of Linq, lambda, delegation, iterators, enumerables, reflection and other idioms that make our life easier with C#.
  • ditoslav
    ditoslav about 6 years
    Praying for the day C# gets higher order types
  • Suncat2000
    Suncat2000 over 5 years
    To help clarify the difference in performance, Find() on a List does not use LINQ. See @Chris Sinclair's answer.