How to use GroupBy in an asynchronous manner in EF Core 3.1?

11,100

Solution 1

I think the only way you have is just to do it something like this

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"))
    .ToListAsync();

var groupedBlogs = blogs.GroupBy(t => t.BlobNumber).Select(b => b).ToList();

Because GroupBy will be evaluated at client anyway

Solution 2

This query isn't trying to group data in the SQL/EF Core sense. There are no aggregations involved.

It's loading all detail rows and then batching them into different buckets on the client. EF Core isn't involved in this, this is a purely client-side operation. The equivalent would be :

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"))
    .ToListAsync();

var blogsByNum = blogs.ToLookup(t => t.BlobNumber);

Speeding up grouping

The batching/grouping/lookup operation is purely CPU bound, so the only way to accelerate it would be to parallelize it, ie use all CPUs to group the data eg :

var blogsByNum = blogs.AsParallel()
                      .ToLookup(t => t.BlobNumber);

ToLookup does more or less that GroupBy().ToList() does - it groups the rows into buckets based on a key

Grouping while loading

A different approach would be to load the results asynchronously and put them into buckets as they arrive. To do that, we need AsAsyncEnumerable(). ToListAsync() returns all the results at once, so it can't be used.

This approach is quite similar to what ToLookup does.


var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"));

var blogsByNum=new Dictionary<string,List<Blog>>();

await foreach(var blog in blogs.AsAsyncEnumerable())
{
    if(blogsByNum.TryGetValue(blog.BlobNumber,out var blogList))
    {
        blogList.Add(blog);
    }
    else
    {
        blogsByNum[blog.BlobNumber=new List<Blog>(100){blog};
    }
}

The query is executed by the call to AsAsyncEnumerable(). The results arrive asynchronously though, so now we can add them to buckets while iterating.

The capacity parameter is used in the list constructor to avoid reallocations of the list's internal buffer.

Using System.LINQ.Async

Things would be a lot easier if we had LINQ operations for IAsyncEnumerable<> itself. This extension namespace provides just that. It's developed by the ReactiveX team. It's available through NuGet and the current major version is 4.0.

With this, we could just write :

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"));

var blogsByNum=await blogs.AsAsyncEnumerable()   individual rows asynchronously
                          .ToLookupAsync(blog=>blog.BlobNumber);

Or

var blogsByNum=await blogs.AsAsyncEnumerable()   
                          .GroupBy(blog=>blog.BlobNumber)
                          .Select(b=>b)
                          .ToListAsync();
Share:
11,100
Gary
Author by

Gary

Updated on June 06, 2022

Comments

  • Gary
    Gary almost 2 years

    When I use GroupBy as part of a LINQ query to EFCore, I get the error System.InvalidOperationException: Client-side GroupBy is not supported.

    This is because EF Core 3.1 attempts to evaluate queries on the server-side as much as possible, as opposed to evaluating them on the client-side, and the call cannot be translated to SQL.

    So the following statement does not work, and produces the error mentioned above:

    var blogs = await context.Blogs
        .Where(blog => blog.Url.Contains("dotnet"))
        .GroupBy(t => t.BlobNumber)
        .Select(b => b)
        .ToListAsync();
    

    Now apparently the solution is to use.AsEnumerable() or .ToList() before the call to GroupBy(), as that explicitly tells EF Core that you want to do the grouping client side. There is a discussion about this on GitHub and in the Microsoft docs.

    var blogs = context.Blogs
        .Where(blog => blog.Url.Contains("dotnet"))
        .AsEnumerable()
        .GroupBy(t => t.BlobNumber)
        .Select(b => b)
        .ToList();
    

    However, this is not asynchronous. How can I make it asynchronous?

    If I change AsEnumerable() to AsAsyncEnumerable(), I get an error. If I instead try to change AsEnumerable() to ToListAsync() then the GroupBy() command fails.

    I am thinking of wrapping it in a Task.FromResult, but would this actually be asynchronous? Or is the database query still synchronous and only the subsequent grouping is asynchronous?

    var blogs = await Task.FromResult(context.Blogs
        .Where(blog => blog.Url.Contains("dotnet"))
        .AsEnumerable()
        .GroupBy(t => t.BlobNumber)
        .Select(b => b)
        .ToList());
    

    Or if that doesn't work is there another way?