DataTable Select vs List<T> LINQ Performance

18,657

Solution 1

Funny there is no "SQL" tag associated with your question. I suggest, you learn how to use the SQL language and its benefits. From what you say, it is likely you are, with your code, creating a lot of Cartesian products, instead of leveraging the Relational Database facilities (joins, indexes, etc.)

Using cross joins of DataTables or Lists or anything similar will always lead to heavy performance degradation, whatever language or platform is used.

That said, you could use LINQ because it's capable of producing smart SQL (dynamically), but you still want to avoid all ToList(), ToArray() and similar extension methods on IEnumerable(T) that summon all the underlying data (keep it enumerable from end to end and leverage "object streaming" whenever possible). If you understand really what is a Relational Database and how to use it efficiently, you will be a better LINQ developer.

Solution 2

Almost anything will be faster than manipulating an ADO.NET DataTable -- they are not designed for fast retrieval in any sense. You should also put the objects into an appropriate data structure; a DataTable is a red-black binary tree of rows, so if you don't want that, you shouldn't use one.

If you're just using the DataTable as a sequential collection of rows with fields, then you'll probably see a factor of 2 or more speedup just by replacing the DataTable with a List<T> and replacing your Select calls with Where calls, although it depends on what you're doing with it.

EDIT: Actually, I changed my mind. Nothing you could be doing sorting-or-filtering-wise with 5000 items in a DataTable implies a cost of anywhere close to 300ms, so the bottleneck is probably unrelated.

Share:
18,657
Jon
Author by

Jon

Updated on June 04, 2022

Comments

  • Jon
    Jon almost 2 years

    I have an app that does an SQL and loads a set of data into a datatable. As part of the processing there are 6 or 7 DataTable.Select() to filter some data. Each item that needs processing takes 300ms. There are 5000 items to process so takes 25 mins. This is unacceptable.

    Would creating POCO's and loading them into a List and then using LINQ to query the list be quicker than using DataTable.Select?

    Thanks

    UPDATE: I have delved in a bit more and there are 2 datatables each with around 15000 records. The 2 queries used to populate the datatables take a second each. It then takes 25mins to loop over 5000 items in a Dictionary's values property and do 5 DataTable.Select's

    eg/

    foreach (OutputRecord Mailpiece in DictionaryMailpieces.Values)
    {
        try
        {
            DataRow[] R = DataTable1.Select("MAILPIECE = " + Mailpiece.MailpieceSetSequenceNumber + " AND (STATUS = 4034 OR STATUS = 4037)", "DAL_DATE desc");
            if (R != null && R.Length > 0)
            {
            }
        }
        catch
        {
        }
    }