Should LINQ be avoided because it's slow?

21,834

Solution 1

Should Linq be avoided because its slow?

No. It should be avoided if it is not fast enough. Slow and not fast enough are not at all the same thing!

Slow is irrelevant to your customers, your management and your stakeholders. Not fast enough is extremely relevant. Never measure how fast something is; that tells you nothing that you can use to base a business decision on. Measure how close to being acceptable to the customer it is. If it is acceptable then stop spending money on making it faster; it's already good enough.

Performance optimization is expensive. Writing code so that it can be read and maintained by others is expensive. Those goals are frequently in opposition to each other, so in order to spend your stakeholder's money responsibly you've got to ensure that you're only spending valuable time and effort doing performance optimizations on things that are not fast enough.

You've found an artificial, unrealistic benchmark situation where LINQ code is slower than some other way of writing the code. I assure you that your customers care not a bit about the speed of your unrealistic benchmark. They only care if the program you're shipping to them is too slow for them. And I assure you, your management cares not a bit about that (if they're competent); they care about how much money you're spending needlessly to make stuff that is fast enough unnoticably faster, and making the code more expensive to read, understand, and maintain in the process.

Solution 2

Why are you using Cast<T>()? You haven't given us enough code to really judge the benchmark, basically.

Yes, you can use LINQ to write slow code. Guess what? You can write slow non-LINQ code, too.

LINQ greatly aids expressiveness of code dealing with data... and it's not that hard to write code which performs well, so long as you take the time to understand LINQ to start with.

If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face. If they came up with a specific bottleneck and said, "We can make this faster by not using LINQ in this situation, and here's the evidence" then that's a very different matter.

Solution 3

Maybe I've missed something, but I'm pretty sure your benchmarks are off.

I tested with the following methods:

  • The Any extension method ("LINQ")
  • A simple foreach loop (your "optimized" method)
  • Using the ICollection.Contains method
  • The Any extension method using an optimized data structure (HashSet<T>)

Here is the test code:

class Program
{
    static void Main(string[] args)
    {
        var names = Enumerable.Range(1, 10000).Select(i => i.ToString()).ToList();
        var namesHash = new HashSet<string>(names);
        string testName = "9999";
        for (int i = 0; i < 10; i++)
        {
            Profiler.ReportRunningTimes(new Dictionary<string, Action>() 
            {
                { "Enumerable.Any", () => ExecuteContains(names, testName, ContainsAny) },
                { "ICollection.Contains", () => ExecuteContains(names, testName, ContainsCollection) },
                { "Foreach Loop", () => ExecuteContains(names, testName, ContainsLoop) },
                { "HashSet", () => ExecuteContains(namesHash, testName, ContainsCollection) }
            },
            (s, ts) => Console.WriteLine("{0, 20}: {1}", s, ts), 10000);
            Console.WriteLine();
        }
        Console.ReadLine();
    }

    static bool ContainsAny(ICollection<string> names, string name)
    {
        return names.Any(s => s == name);
    }

    static bool ContainsCollection(ICollection<string> names, string name)
    {
        return names.Contains(name);
    }

    static bool ContainsLoop(ICollection<string> names, string name)
    {
        foreach (var currentName in names)
        {
            if (currentName == name)
                return true;
        }
        return false;
    }

    static void ExecuteContains(ICollection<string> names, string name,
        Func<ICollection<string>, string, bool> containsFunc)
    {
        if (containsFunc(names, name))
            Trace.WriteLine("Found element in list.");
    }
}

Don't worry about the internals of the Profiler class. It just runs the Action in a loop and uses a Stopwatch to time it. It also makes sure to call GC.Collect() before each test to eliminate as much noise as possible.

Here were the results:

      Enumerable.Any: 00:00:03.4228475
ICollection.Contains: 00:00:01.5884240
        Foreach Loop: 00:00:03.0360391
             HashSet: 00:00:00.0016518

      Enumerable.Any: 00:00:03.4037930
ICollection.Contains: 00:00:01.5918984
        Foreach Loop: 00:00:03.0306881
             HashSet: 00:00:00.0010133

      Enumerable.Any: 00:00:03.4148203
ICollection.Contains: 00:00:01.5855388
        Foreach Loop: 00:00:03.0279685
             HashSet: 00:00:00.0010481

      Enumerable.Any: 00:00:03.4101247
ICollection.Contains: 00:00:01.5842384
        Foreach Loop: 00:00:03.0234608
             HashSet: 00:00:00.0010258

      Enumerable.Any: 00:00:03.4018359
ICollection.Contains: 00:00:01.5902487
        Foreach Loop: 00:00:03.0312421
             HashSet: 00:00:00.0010222

The data is very consistent and tells the following story:

  • Naïvely using the Any extension method is about 9% slower than naïvely using a foreach loop.

  • Using the most appropriate method (ICollection<string>.Contains) with an unoptimized data structure (List<string>) is approximately 50% faster than naïvely using a foreach loop.

  • Using an optimized data structure (HashSet<string>) completely blows any of the other methods out of the water in performance terms.

I have no idea where you got 243% from. My guess is it has something to do with all that casting. If you're using an ArrayList then not only are you using an unoptimized data structure, you're using a largely obsolete data structure.

I can predict what comes next. "Yeah, I know you can optimize it better, but this was just an example to compare the performance of LINQ vs. non-LINQ."

Yeah, but if you couldn't even be thorough in your example, how can you possibly expect to be this thorough in production code?

The bottom line is this:

How you architect and design your software is exponentially more important than what specific tools you use and when.

If you run into performance bottlenecks - which is every bit as likely to happen with LINQ vs. without - then solve them. Eric's suggestion of automated performance tests is an excellent one; that will help you to identify the problems early so that you can solve them properly - not by shunning an amazing tool that makes you 80% more productive but happens to incur a < 10% performance penalty, but by actually investigating the issue and coming up with a real solution that can boost your performance by a factor of 2, or 10, or 100 or more.

Creating high-performance applications is not about using the right libraries. It's about profiling, making good design choices, and writing good code.

Solution 4

Is LINQ a real-world bottleneck (either effecting the overall or perceived performance of the application)?

Will your application be performing this operation on 1,000,000,000+ records in the real-world? If so--then you might want to consider alternatives--if not then it's like saying "we can't buy this family sedan because it doesn't drive well at 180+ MPH".

If it's "just slow" then that's not a very good reason... by that reasoning you should be writing everything in asm/C/C++, and C# should be off the table for being "too slow".

Solution 5

While premature pessimization is (imho) as bad as premature optimization, you shouldn't rule out an entire technology based on absolute speed without taking usage context into consideration. Yes, if you're doing some really heavy number-crunching and this is a bottleneck, LINQ could be problematic - profile it.

An argument you could use in favour of LINQ is that, while you can probably outperform it with handwritten code, the LINQ version could likely be clearer and easier to maintain - plus, there's the advantage of PLINQ compared to complex manual parallelization.

Share:
21,834
user455095
Author by

user455095

Updated on February 07, 2020

Comments

  • user455095
    user455095 over 4 years

    I've had been told that since .net linq is so slow we shouldn't use it and was wondering anyone else has come up with the same conclusion, and example is:

    Took 1443ms to do 1000000000 compares non-LINQ.
    Took 4944ms to do 1000000000 compares with LINQ.
    (243% slower)

    the non-LINQ code:

    for (int i = 0; i < 10000; i++)
    {
        foreach (MyLinqTestClass1 item in lst1) //100000 items in the list
        {
            if (item.Name == "9999")
            {
                isInGroup = true;
                break;
            }
        }
    }
    

    Took 1443ms to do 1000000000 compares non-LINQ.

    LINQ code:

    for (int i = 0; i < 10000; i++)  
        isInGroup = lst1.Cast<MyLinqTestClass1>().Any(item => item.Name == "9999");  
    

    Took 4944ms to do 1000000000 compares with LINQ.

    I guess its possible to optimize the LINQ code but the thought was that its easily to get really slow LINQ code and given that it shouldn't be used. Given that LINQ is slow then it would also follow that PLINQ is slow and NHibernate LINQ would be slow so any kind on LINQ statement should not be used.

    Has anyone else found that LINQ is so slow that they wished they had never used it, or am I making a too general conclusion based on benchmarks like this?

  • user455095
    user455095 over 13 years
    there isnt a place in the application that has a performance issue but it more of a idea that it could be if for whatever reason the code is executed a billion times like the sample and given that , it shouldnt be used.
  • mqp
    mqp over 13 years
    What's the deal with the wishy-washiness? You're the one that knows what your code is doing and how it's doing it. If LINQ is slow at doing it, then don't use LINQ. If there's no performance issue, then it doesn't matter, does it?
  • user455095
    user455095 over 13 years
    yes, I gave the reply that taking out the cast would help but was told that it wasnt as slow but still slow. Our application need to be fast so I guess the idea is that anything that could be slow should be avoided but Im not sure if thats a valid assumption
  • user455095
    user455095 over 13 years
    actually they was to not allow parrallel linq too since its linq it could suffer from the same performance issues.
  • Jon Skeet
    Jon Skeet over 13 years
    @user455095: No, it's absolutely not a valid assumption. "Slow" is far from a precise term - and I very much doubt that this is a realistic benchmark to start with.
  • Abe Miessler
    Abe Miessler over 13 years
    @Fredrik, based on the fist sentence in his question I'd say this isn't his decision and he's trying to find some valid arguments for using LINQ. Doesn't seem wishy washy at all...
  • David Thornley
    David Thornley over 13 years
    @user455095: Does the code have a performance issue? If not, then everything in it is running fast enough, and it's not worthwhile to change something just because it's slow. If the code does have a performance issue, profile to see if the LINQ call is having a significant effect. If it is, test it both ways. If LINQ is more readable, faster to write, easier to get right, easier to maintain, or something like that, it's likely worth some performance hit.
  • user455095
    user455095 over 13 years
    there isnt performance issues but if there was ,instead of trying to tune any linq statement it was decided we should just not use linq given a standard foreach work faster even though its a few more lines of code and you dont have to worry about tuning linq code.
  • Sean Vieira
    Sean Vieira over 13 years
    +1 for the image of a family sedan driving at 180 miles an hour.
  • user455095
    user455095 over 13 years
    its more of an issue of what if, and trying to avoid things that could possibly cause problems in the future and I guess given that linq is another layer that we dont know how it will react and could cause performance problems we have to deal with in the future and its better to avoid them now and not have to deal with the cost and time of dealing with them in the furture. I think its hard to predict the future but I guess some can.
  • user2575001
    user2575001 over 13 years
    My response would be that it will also take 4944ms to do 3426195426 compares with non-LINQ code so on that basis you shouldn't use non-LINQ code either ...
  • STW
    STW over 13 years
    @MikeJ-UK -- or that the machine is clearly too slow. Throw some hardware at it and overclock as needed to maintain a consistent throughput
  • STW
    STW over 13 years
    +1 for sage wisdom, and a good one-liner to use as a proverbial slap for anyone insisting on marginal performance taking priority over code sanity
  • Dan Bryant
    Dan Bryant over 13 years
    The only time I ran into a case where I removed LINQ for performance reasons was for a a routine implementing AI in a game. This particular method was executed extremely frequently within a deep inner loop. The main impact I found wasn't actually LINQ, but rather the difference between indexing through an array directly versus indexing through an enumerator (my first attempt to improve was using foreach directly, which was of less benefit than switching to a classic for loop.) I only made this change because profiling identified that the code was spending 40% of its time here.
  • samiretas
    samiretas over 13 years
    @user455095, it's not about predicting the future. It's about dealing with it when it comes. You can write the app faster with linq. If it runs fast enough, then who cares if linq is a touch slower than an esoteric, convoluted solution. If in the future, you deem a linq query to be slow, you optimize that linq query, or change that query to something faster. You don't remove linq from the entire app because it was slower in one spot.
  • user455095
    user455095 over 13 years
    it actually is scientific and heavy on math and given that maybe it should be avoided in those situations but should it be applied everywhere. I guess you could make the case that you may need to do a 1 billion time loop anywhere.
  • Eric Lippert
    Eric Lippert over 13 years
    @user455095: So what you're saying is that the devil you know is better than the devil you don't. Which is fine, if you like making business decisions on the basis of old sayings. I think it is generally a better idea to make business decisions based on informed opinions derived from empirical measurements. You're going in the right direction here by making empirical measurements, so that's good. But you're comparing that measurement against the wrong thing. The things you should be measuring are customer satisfaction and shareholder cost, not microseconds per comparison.
  • Anthony Pegram
    Anthony Pegram over 13 years
    @user, is lst1 an Arraylist, per chance? (Could explain the Cast<>)
  • Nelson Rothermel
    Nelson Rothermel over 13 years
    @user455095 - Except in many circumstances parallel LINQ could actually greatly speed up the processing compared to non-LINQ. Sure, you can implement threading yourself and end up with 20 lines of code instead of one.
  • user455095
    user455095 over 13 years
    Well we have been in development over 3 years and it will probably be another year before customers see it. I guess its more of a question that given its a huge product with a massive amount of code we need to try and eliminate anything that could cause performance problems given if it does become an issue then it could be costly and time consuming to find. Again it come to predicting and controling the future.
  • STW
    STW over 13 years
    @user455095 -- you mean, the case could be made that taking an extra 2.5 seconds to process 1 billion items would be unacceptable. If they're concerned with this level of misinformation then you might want to pull the rug out from under them; take a look at using F#--if this is scientific/mathematical then F# might offer significant advantages
  • Nelson Rothermel
    Nelson Rothermel over 13 years
    What does this have to do with LINQ? Sure, anonymous types are used in LINQ, but they are not LINQ.
  • STW
    STW over 13 years
    It reminds me of Juval Lowy's "Every class as a WCF service" talk. He goes on a well-founded tirade about how using this type of scenario (using a raw-loop) to compare performance is not only meaningless, but often produces wrong results when compared to real-world measurements.
  • Beep beep
    Beep beep over 13 years
    @user455095 - I agree with STW. If performance is that critical, it would benefit them to ditch C# altogether for the heavy math routines and use the GPU or at least straight C.
  • Aaronaught
    Aaronaught over 13 years
    @user: How about instead of "eliminating anything that could cause performance problems", you start with code that's correct and easy to maintain, and then eliminating specific performance problems as they come (which should be very easy if your software is well-architected and well-designed)? I hate to be this blunt, but you're basically announcing to the world here that you don't know how to do your job. Part of your "problem" seems to be that the code is so "massive" - well, guess what, it wouldn't be so massive if you were using LINQ effectively.
  • Anthony Pegram
    Anthony Pegram over 13 years
    @user, I guarantee you there will be examples when LINQ will perform faster than code you would write yourself. Not because you couldn't write code that's faster than LINQ, it's just that you wouldn't be given the time to do so. You're spending all your time trying to meet the specification, integrate business rules that often contradict one another, find and squash bugs, etc. Meanwhile, Microsoft has a team of people optimizing the heck out of a Join operation. With LINQ, you're going to write more expressive code and you will write it faster. And it will perform well, too.
  • Aaronaught
    Aaronaught over 13 years
    You again. Don't you have anything better to do than troll software Q&A sites?
  • cjk
    cjk over 13 years
    @user455095: Linear execution of fast code is almost certianly to be slower than parallel execution of slightly slower code
  • user455095
    user455095 over 13 years
    we actually have a lot of our heavy scientific and math routine in C++ and even fortran .
  • tia
    tia over 13 years
    +1 for clear, brief and practical explanation.
  • cjk
    cjk over 13 years
    Your reasons are only valid if you don't know what you are doing. Experienced programmers don't / won't experience these problems as they will know how their types work.
  • Jon Hanna
    Jon Hanna over 13 years
    I wouldn't agree entirely with @ck, as there are times when parallel execution gains little (when something forces it to not really be parallel), but in general it's true. So, @user455095 you aren't allowed to use a more efficient method because one part of it is less efficient than one part of the less efficient method? Sorry, that's not optimisation, that's just superstition.
  • Beep beep
    Beep beep over 13 years
    @user455095 - LINQ sometimes makes it easier to read code, and sometimes makes it faster to write. If the client is willing to pay more for you to write your code, then by all means drop LINQ. In my experience, short LINQ statements are a huge benefit, while long/complex ones make it harder to understand code.
  • Richard Anthony Freeman-Hein
    Richard Anthony Freeman-Hein over 13 years
    "If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face." I've actually laughed in a few faces for the same reason. Also, for when they say, "I don't see the value".
  • Niki
    Niki over 13 years
    "Our application need to be fast so I guess the idea is that anything that could be slow should be avoided" - that's the dumbest thing I've ever heard. Strings can be slow. Arrays can be slow. virtual methods can be slow, interfaces can be slow. Anything can be slow, if you use it wrong, but that doesn't mean you should avoid all those things everywhere in your code.
  • Eric Lippert
    Eric Lippert over 13 years
    @user455095: It sounds like you are having some anxiety about future performance. The way to mitigate that anxiety is to institute a policy of nightly automated performance tests that measure the performance of the real product against real customer metrics. That way every single day you can track the trend of your performance numbers. If you see that there is a sudden spike in bad performance then you can examine all the change logs for that day and figure out what change caused the bad performance. The expense of fixing a perf problem rises the longer it is there without you knowing!
  • user455095
    user455095 over 13 years
    ya, I actaully forgot about that, we do have performance tests and if any of those test times change by a certain percentage then an alert will be raised so if any performance problems do come up then they would be caught there but thats something to thing about if anyone has these types of performance worries.
  • Lance Roberts
    Lance Roberts over 13 years
    +1 for "premature pessimization"
  • Eric Lippert
    Eric Lippert over 13 years
    @user455095: Then why all the worry? Try using LINQ today and tomorrow you will know whether it caused an unacceptably high performance degradation. You say that the uncertainty of using LINQ is too high; well, you're not going to become more certain until you spend some time using it and learn what works for you and what doesn't. If your question really is "should LINQ be avoided because my team doesn't understand how to use it effectively?" then that's a different question than the one you asked. My answer to that question would be "no; rather, learn how to use it effectively!"
  • Aaronaught
    Aaronaught over 13 years
    I forgot to mention, this was compiled with the TRACE flag off, so no overhead is being incurred there for this test.
  • user455095
    user455095 over 13 years
    unfortunatly if your team lead says you cant use it then its kind of hard but Ill pass him this thread at any rate.
  • Amy B
    Amy B over 13 years
    +1, good catch on an important point not addressed in other answers. OP doesn't understand that varying implementations will have varying performance characteristics and considerations.
  • eglasius
    eglasius over 13 years
    +1 for "If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face."
  • STW
    STW over 13 years
    @Dan - DONT SAY THAT!!! The next question will be "I stopped using foreach loops because they're slower than for loops, I need a for loop with basic transactions (break the loop if the collection changes)"
  • user455095
    user455095 over 13 years
    well, first is isnt a recomendation, it is more of a dictated rule . If you can use a foreach rather than a linq statement then why not use it and not take the risk that you are doing a linq statement 10000 times since foreach could be considered less complex(not my words). I guess this thread is changing more into what is good programming practice and that tends to be more subjective. If you get 10 programmers in a room and ask their opintion, what would you get? 10 different opinions..... and a lot of arguing.. Im starting to sound a little cynical.
  • Dan Bryant
    Dan Bryant over 13 years
    @STW, though I haven't measured it, my guess is that the significant improvement was due to the fact that it was directly accessing an array, which is given special treatment in IL and likely jits to very efficient code when indexing. I still had to do much more aggressive search tree pruning (use a better algorithm), as the sheer number of iterations was the biggest problem. There's little point in improving the efficiency at which you can do unnecessary work.
  • Nelson Rothermel
    Nelson Rothermel over 13 years
    Apparently it's not possible to get negative reputation. Does SO have an "overdraft" feature so all these negative votes actually count in the future? :)
  • Aaronaught
    Aaronaught over 13 years
    @user: "If you can use a foreach rather than a LINQ statement then why not use [the foreach statement] and not take the risk [...]?" - my reason not to use the foreach statement is that the risk of a performance problem is vanishingly small when weighed against the risk that a bug will eventually be introduced due to awkwardly-written imperative code. It's easy to write a fast program that doesn't work. It's also harder to fix the fast-but-wrong program than it is to speed up the slow-but-correct program.
  • Martin Törnwall
    Martin Törnwall over 13 years
    @user455095: I'm not aware of a "dictated rule" that says code should be written with execution speed as the primary point of consideration. There are certainly specific cases where execution speed should be favored over clarity and simplicity (think games, a chess engine, compression code, etc). In my experience, though, you'll greatly increase your chances of shipping a quality product by focusing on functionality rather than obsessing over speed. @Aaronaught: very well put -- you've nicely expressed the point I was trying to make.
  • James Dunne
    James Dunne over 13 years
    @user455095: I'm sorry to say that your arguments about some way-off future scenario that might cause some negligible performance issue hold no water. Listen to these people; they know what they are talking about. Eric Lippert works on the team that builds your C# compiler, in case you were not aware. Jon Skeet works for Google. Both of these gentlemen and others here are trying to tell you something. L I S T E N.
  • Doctor Jones
    Doctor Jones over 13 years
    +1 for "If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face". I wish you worked in my company Jon Skeet, I really do! :-)
  • Doctor Jones
    Doctor Jones over 13 years
    I'm still laughing at that quote now! I'd give another +1 if I could :-)
  • Andrei Rînea
    Andrei Rînea over 13 years
    Well, here's your problem. The Team Lead.
  • Water Cooler v2
    Water Cooler v2 almost 9 years
    Although I am not educated enough to take a position on this debate, and I greatly enjoy using LINQ and studying the LINQ Expression API and writing LINQ providers for practice, I do believe that this poster makes some very interesting and valid observations. This doesn't go to say that I don't agree with the counter arguments. I actually agree with all that Eric Lippert and Jon Skeet said. Even then, I just found this person's post to be an intelligent assimilation, which sadly, I suspect out of a mob reaction (you downvote when there's already another downvote), was left unheeded to.
  • Daniel B
    Daniel B over 5 years
    How about the cost of converting enumerable to hashset?