DataView.RowFilter Vs DataTable.Select() vs DataTable.Rows.Find()

68,912

Solution 1

You are looking for the "best approach on finding rows in a datatable", so I first have to ask: "best" for what? I think, any technique has scenarios where it might fit better then the others.

First, let's look at DataView.RowFilter: A DataView has some advantages in Data Binding. Its very view-oriented so it has powerful sorting, filtering or searching features, but creates some overhead and is not optimized for performance. I would choose the DataView.RowFilter for smaller recordsets and/or where you take advantage of the other features (like, a direct data binding to the view).

Most facts about the DataView, which you can read in older posts, still apply.

Second, you should prefer DataTable.Rows.Find over DataTable.Select if you want just a single hit. Why? DataTable.Rows.Find returns only a single row. Essentially, when you specify the primary key, a binary tree is created. This has some overhead associated with it, but tremendously speeds up the retrieval.

DataTable.Select is slower, but can come very handy if you have multiple criteria and don't care about indexed or unindexed rows: It can find basically everything but is not optimized for performance. Essentially, DataTable.Select has to walk the entire table and compare every record to the criteria that you passed in.

I hope you find this little overview helpful.

I'd suggest to take a look at this article, it was helpful for me regarding performance questions. This post contains some quotes from it.

A little UPDATE: By the way, this might seem a little out of scope of your question, but its nearly always the fastest solution to do the filtering and searching on the backend. If you want the simplicity and have an SQL Server as backend and .NET3+ on client, go for LINQ-to-SQL. Searching Linq objects is very comfortable and creates queries which are performed on server side. While LINQ-to-Objects is also a very comfortable but also slower technique. In case you didn't know already....

Solution 2

Thomashaid's post sums it up nicely:

  • DataView.RowFilter is for binding.
  • DataTable.Rows.Find is for searching by primary key only.
  • DataTable.Select is for searching by multiple columns and also for specifying an order.

Avoid creating many DataViews in a loop and using their RowFilters to search for records. This will drastically reduce performance.

I wanted to add that DataTable.Select can take advantage of indexes. You can create an index on a DataTable by creating a DataView and specifying a sort order:

DataView dv = new DataView(dt);
dv.Sort = "Col1, Col2";

Then, when you call DataTable.Select(), it can use this index when running the query. We have used this technique to seriously improve performance in places where we use the same query many, many times. (Note that this was before Linq existed.)

The trick is to define the sort order correctly for the Select statement. So if your query is "Col1 = 1 and Col2 = 4", then you'll want "Col1, Col2" like in the example above.

Note that the index creation may depend on the actual calls to create the DataView. We had to use the new DataView(DataTable dt) constructor, and then specify the Sort property in a separate step. The behavior may change slightly with different .NET versions.

Share:
68,912
A G
Author by

A G

I am a creative, hands-on developer who is passionate about software engineering, building products that are easy to use, delight users and solve real world problems. Though not a designer by profession, I have keen interest in usability & user experience (application/software design). Started programming & graphic design in high school (1997). Professionally, I have a diverse work experience (~15 years) from building products like forex trading systems, bitcoin miner to working as CTO for a funded startup. Recently I worked as a technical architect (& lead) of a multi million $ mobile app build in React Native for STC (Saudi Telecom Company). At present I am helping a few international companies build their engineering teams.

Updated on November 04, 2020

Comments

  • A G
    A G over 3 years

    Considering the code below:

    Dataview someView = new DataView(sometable)
    someView.RowFilter = someFilter;
    
    if(someView.count > 0) {  …. }
    

    Quite a number of articles which say Datatable.Select() is better than using DataViews, but these are prior to VS2008.

    Solved: The Mystery of DataView's Poor Performance with Large Recordsets
    Array of DataRecord vs. DataView: A Dramatic Difference in Performance

    Googling on this topic I found some articles/forum topics which mention Datatable.Select() itself is quite buggy(not sure on this) and underperforms in various scenarios.

    On this(Best Practices ADO.NET) topic on msdn it is suggested that if there is primary key defined on a datatable the findrows() or find() methods should be used insted of Datatable.Select().

    This article here (.NET 1.1) benchmarks all the three approaches plus a couple more. But this is for version 1.1 so not sure if these are valid still now. Accroding to this DataRowCollection.Find() outperforms all approaches and Datatable.Select() outperforms DataView.RowFilter.

    So I am quite confused on what might be the best approach on finding rows in a datatable. Or there is no single good way to do this, multiple solutions exist depending upon the scenario?

  • James
    James over 11 years
    I just found a case where the results were different between the .Select and RowFilter techniques. In my case Select returned 532 rows and RowFilter was returning 540. I found the difference to be related to extra spaces in the table data, and resolved it by using Trim in the select statement TRIM(VendorNumber) = '500'
  • Chris Smith
    Chris Smith over 10 years
    Whoa this is super handy. I can't believe this isn't documented on MSDN. With like 1 line of code, I drastically improved the performance of my DataTable.Select() calls without doing all the silly FindRows() and Dictionary work arounds. THANKS
  • JohanLarsson
    JohanLarsson over 8 years
    super, that made my day. Now queries are 300% faster!
  • LMK
    LMK almost 8 years
    If you step through the underlying .Net source, you will see that often .Select() does create an index itself, if the conditions are right. Such as when a simple expression like "col1 = 3 and col2 = 4" is used. You can see this by examining the private [indexes] field of the table after the select. In those cases there is no need to create a DataView. The answer above also doesn't work for me, I need to create a DataView with just the table constructor, and then set the [Sort] property separately. Not sure why...
  • Paul Williams
    Paul Williams almost 8 years
    Our code has the same pattern: 1) create DataView with DataTable-only constructor 2) set Sort property. I will note that in the answer.
  • مسعود
    مسعود almost 8 years
    @ paul suppose there is a function for lookup in the same file which takes datatable as argument. so how can I use this concept in that function. will it use the view i created above automatically or i have to pass it in the function instead of the datatable.
  • Paul Williams
    Paul Williams almost 8 years
    @Masood This sounds like a new question that you could ask on StackOverflow. The answer depends on your implementation. Just don't create the same index repeatedly in a loop. Note that creating the index may be as expensive as querying the table only one time.