Entity Framework Include performance

19,739

Solution 1

Your second approach relies on the EF navigation property fixup process. The problem is though that every

query.Include(q => q.ItemNavN).Load();

statement will also include all the master record data along with the related entity data.

Using the same basic idea, one potential improvement could be to execute one Load per each navigation property, replacing the Include with either Select (for references) or SelectMany (for collections) - something similar to how EF Core processes the Includes internally.

Taking your second approach example, you could try the following and compare the performance:

var query = ctx.Filters.Where(x => x.SessionId == id)
    .Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);

query.Select(x => x.ItemNav1).Load();
query.Select(x => x.ItemNav2).Load();
query.Select(x => x.ItemNav3).Load();
query.Select(x => x.ItemNav4).Load();
query.Select(x => x.ItemNav5).Load();
query.Select(x => x.ItemNav6).Load();

var result = query.ToList();
// here all the navigation properties should be populated 

Solution 2

For everybody coming here, I want you to know following 2 things:

  1. .Select(x => x.NavProp).Load() does not actually load the navigation property if you have turned off tracking.

  2. Since version 3.0.0, each Include will cause an additional JOIN to be added to SQL queries produced by relational providers, whereas previous versions generated additional SQL queries. This can significantly change the performance of your queries, for better or worse. In particular, LINQ queries with an exceedingly high number of Include operators may need to be broken down into multiple separate LINQ queries in order to avoid the cartesian explosion problem.

Source for both statements: https://docs.microsoft.com/en-us/ef/core/querying/related-data

So it is not true that EF Core does Select and SelectMany in the backgruond. In my case we had a single entity with loads of navigation properties, and with Include it actually loaded over 15,000 rows (yes that's correct and what I would call a Cartesian Explosion problem). After I refactored the code to work with Select / SelectMany, that row count was reduced to 118. Query time reduced from 4s to below a second, even though we have exactly 20 includes)

Hope this helps somebody, and BIG thanks to Ivan.

Solution 3

There are many ways to increase performance.

I'll put here some and you can try each one to see who gives you the best results.

You can use the System.Diagnostics.StopWatch to get the elapsed execution time.

1. Indexes are missing (e.g. on foreign keys)

2. Write your query in a view in the database, it is much cheeper. You can also create indexed view for this query.

3. Try to load data in separate queries:

context.Configuration.LazyLoadingEnabled = false;
context.ContactTypes.Where(c => c.ContactID== contactId).Load();
context.ContactConnections.Where(c => c.ContactID== contactId).Load();
return context.Contacts.Find(contactId);

This loads all required data into the context's cache. Important: turn off lazy loading because the child collections are not marked as loaded in the entity state manager and EF will try to trigger lazy loading when you want to access them.

4. Replacing the Include with Select().Load():

var query = ctx.Users.Where(u => u.UserID== userId)
    .Join(ctx.Persons, p => p.PersonID, us => us.PersonID, (pr, ur) => ur);

query.Select(x => x.PersonIdentities).Load();
query.Select(x => x.PersonDetails).Load();
var result = query.ToList();

Remember: turn on tracking for to load the navigation property.

5. Separate includes, to multiple calls, limit to 2 includes in each call, and then loop to connect object properties.

Here is an example for single object fetch:

var contact= from c in db.Contacts
                        .Include(p=>p.ContactTypes)
                        .Include(p=>p.ContactConnections)
                        .FirstOrDefault();

var contact2= from c in db.Contacts
                    .Include(p=>p.ContactIdentities)
                    .Include(p=>p.Person)
                    .FirstOrDefault();
contact.ContactIdentities = contact2.ContactIdentities ;
contact.Person= contact2.Person;
return contact.
Share:
19,739
Corporalis
Author by

Corporalis

Updated on July 12, 2022

Comments

  • Corporalis
    Corporalis almost 2 years

    I have been looking at Entity Framework performance, particularly around the use of Includes and the time taken to both generate and execute the various queries.

    I am going to detail changes I have made, but please correct me if you think any of these assumptions are wrong.

    Firstly we have around 10,000 items (not many) in a DB and the database is significantly normalized (which results in a significant number of navigation properties). Currently, the approach is to lazy load everything, and given that requesting one item can spool off tens of DB requests, the performance is quite poor, particularly for larger sets of data. (This is an inherited project and step one is trying to improve performance without significant restructuring)

    So my first step was to take the results of a query and then apply the Includes for the navigation properties only to those results. I know this technically performs 2 queries, but if we have 10,000 items stored, but only want to return 10 items, it makes more sense to only include the navigation properties on those 10 items.

    Secondly, where multiple includes are used on a query result and that result set size is quite large, it still suffered from poor performance. I have been pragmatic about when to eager load and when to leave the lazy loading in place. My next change was to load query includes in batches, so performing:

    query.Include(q => q.MyInclude).Load();

    This once again significantly improved performance, although a few more DB calls (one for each batch of includes) it was quicker than a large query or at the very least reduced the overhead of Entity Framework trying to produce that large query.

    So the code now looks something like this:

        var query = ctx.Filters.Where(x => x.SessionId == id)
            .Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);
        query
            .Include(x => x.ItemNav1)
            .Include(x => x.ItemNav2).Load();
    
        query
            .Include(x => x.ItemNav3)
            .Include(x => x.ItemNav4).Load();
    
        query
            .Include(x => x.ItemNav5)
            .Include(x => x.ItemNav6).Load();
                
    

    Now, this is reasonably performant, however, it would be nice to improve this further.

    I had considered using LoadAsync(), which after a bit more refactoring would be possible and would better fit with the rest of the architecture.

    However, you can only execute one query at a time on a DB context. So I was wondering if there was any way to possibly create a new DB context, perform LoadAsync() on each group of navigation properties (asynchronously) and then concatenate all of the results.

    I know technically how you might create a new context, fire off a LoadAsync() for each navigation group, but not how to concatenate the results, I don't know if it is definitely possible or whether it goes against good practice.

    So my question is; is this possible or, is there another way I can further improve performance? I'm trying to stick with what Entity Framework provides rather than crafting some stored procs. Thanks

    UPDATE

    Regarding the performance disparity, I'm seeing between using all Includes in one statement and Loading these in small groups. When running a query that returns 6000 items. (Using SQL profiler and VS diagnostics to determine times)

    Grouped Includes: In total takes ~8 seconds to execute the includes.

    Includes in one statement: SQL query is taking ~30 seconds to load. (Often getting timeouts)

    After a bit more investigation, I don't think there is much overhead when EF converts the SQL results to models. However we have seen nearly 500ms taken for EF to generate complex queries, which isn't ideal, but I'm not sure this can be resolved

    UPDATE 2

    With Ivan's help and following this https://msdn.microsoft.com/en-gb/data/hh949853.aspx we were able to improve things further, particularly using SelectMany. I would highly recommend the MSDN article to anyone attempting to improve their EF performance.

  • Corporalis
    Corporalis over 7 years
    I should have mentioned that we have gone through and indexed accordingly. I do think that the normalised database is probably the main reason why we are seeing these issues. When we had something similar to the answer you provided and the result set returned was more than 1000 items, there did seem to be a significant drop in performance. I possibly didn't make it clear in my question, but this is a search feature so it could return 10 items, it could return 5000. Ideally pagination would be implemented, but that's Step 2.
  • Corporalis
    Corporalis over 7 years
    I think what I'm getting at is, if a significant number of items are returned with a sizable number of includes, both the query and the EF query generation seem to suffer from poor performance. Thanks for the explanation
  • Vlad274
    Vlad274 over 7 years
    When you say the Query is slow, do you mean that it is literally slow (for example if you run via SSMS) or does it return slowly to the application (IE it's passing back a lot of data)?
  • Vlad274
    Vlad274 over 7 years
    Also, can you elaborate (preferable in the question so other people see it) what you mean by "the EF query generation"? This shouldn't have any issues, but might point to an issue in the entity configuration
  • Corporalis
    Corporalis over 7 years
    Thanks @Ivan I will try this when I'm back in the office.
  • Corporalis
    Corporalis over 7 years
    Thanks for the suggestion, I've spent some time looking at this today. I've had sql profiler running whilst comparing the original code your sample. When running your sample, I can see that the relevant queries are executed. The only difference being that the select part of the query only includes columns of the table specified within the navigation property. The original version is the same query with the properties on the original table as well. However, using your version when I attempt to access a property it actually lazy loads it as if the data wasn't pushed into the original query.
  • Corporalis
    Corporalis over 7 years
    Oddly the original code, although not producing vastly different SQL, results in the navigation properties being correctly eagerly loaded. Again thanks for your help.
  • Ivan Stoev
    Ivan Stoev over 7 years
    Weird. But yeah, the only difference in this attempt is not returning the fields of the main table along with the referenced table data, which is the whole point of executing separate queries. I'm pretty sure this loads the navigation properties, but probably doesn't mark them as loaded as with Include approach. Anyway, just out of curiosity, does it has any positive effect on performance of the loading part (ignoring the lazy load later), just the above code, what is the time compared to batch include approach?
  • Corporalis
    Corporalis over 7 years
    It is significantly better. The batched code takes 8~ seconds and your code is 5~ seconds. Predominantly there is one include which is significantly slow: batched: 3.3 seconds, your example: 2.5 seconds. That's a massive improvement, if I could just get the eager loading working, this would be a definite solution.
  • Ivan Stoev
    Ivan Stoev over 7 years
    I was playing with this before posting, and I'm pretty sure the loading worked similar to Include. My test was with simple 2 entities with one-to-many relationship, with query being on the many side entity. So the parent reference was loaded and no lazy load is triggered when accessing it (I have EF logging on). However, accessing the inverse collection property does trigger lazy loading, but the same happens with using Include. I'm running latest EF6.1.3 if that matters. Anyway, thanks for sharing the timings and good luck.
  • Corporalis
    Corporalis over 7 years
    Thanks for that info Ivan, I'll see if I can adapt things and get it working.
  • MrZander
    MrZander almost 7 years
    @Corporalis Did you ever get this method working? If so, could you post how?
  • MrZander
    MrZander almost 7 years
    Right now, I'm using your method of "batching" includes in pairs of two, and its significantly improved my query time (from 10s to 2.5s) but I know my dataset is going to grow and would like to improve it if possible.
  • Corporalis
    Corporalis almost 7 years
    @MrZander although this worked with our prototype applications, we encountered a few issues attempting to modify the vastly complex ef queries. After further consideration, what we actually ended up doing was making all ef calls asynchronous(multiple db contexts), and mapped the data manually, which performed significantly quicker again. More work, but it seemed like the best solution for our needs.
  • MrZander
    MrZander almost 7 years
    @Corporalis I was afraid that would be the answer. I find myself working around EF more often than working with it nowadays. Thanks!
  • Larcho
    Larcho about 3 years
    One observation I'd like to add is, this is heavily affected by many-to-many relationship items. The more your entity has these, the more efficient it gets separating them into individual queries. I believe it doesn't affect one-to-many items, so in those I still continue using Include (mainly because of the ThenInclude feature).
  • EluciusFTW
    EluciusFTW about 3 years
    You don't have to do this manually anymore, you can use .AsSplitQuery(), works like a charm!