Parallel doesnt work with Entity Framework

46,219

Solution 1

The underlying database connections that the Entity Framework are using are not thread-safe. You will need to create a new context for each operation on another thread that you're going to perform.

Your concern about how to parallelize the operation is a valid one; that many contexts are going to be expensive to open and close.

Instead, you might want to invert how your thinking about parallelizing the code. It seems you're looping over a number of items and then calling the stored procedures in serial for each item.

If you can, create a new Task<TResult> (or Task, if you don't need a result) for each procedure and then in that Task<TResult>, open a single context, loop through all of the items, and then execute the stored procedure. This way, you only have a number of contexts equal to the number of stored procedures that you are running in parallel.

Let's assume you have a MyDbContext with two stored procedures, DoSomething1 and DoSomething2, both of which take an instance of a class, MyItem.

Implementing the above would look something like:

// You'd probably want to materialize this into an IList<T> to avoid
// warnings about multiple iterations of an IEnumerable<T>.
// You definitely *don't* want this to be an IQueryable<T>
// returned from a context.
IEnumerable<MyItem> items = ...;

// The first stored procedure is called here.
Task t1 = Task.Run(() => { 
    // Create the context.
    using (var ctx = new MyDbContext())
    // Cycle through each item.
    foreach (MyItem item in items)
    {
        // Call the first stored procedure.
        // You'd of course, have to do something with item here.
        ctx.DoSomething1(item);
    }
});

// The second stored procedure is called here.
Task t2 = Task.Run(() => { 
    // Create the context.
    using (var ctx = new MyDbContext())
    // Cycle through each item.
    foreach (MyItem item in items)
    {
        // Call the first stored procedure.
        // You'd of course, have to do something with item here.
        ctx.DoSomething2(item);
    }
});

// Do something when both of the tasks are done.

If you can't execute the stored procedures in parallel (each one is dependent on being run in a certain order), then you can still parallelize your operations, it's just a little more complex.

You would look at creating custom partitions across your items (using the static Create method on the Partitioner class). This will give you the means to get IEnumerator<T> implementations (note, this is not IEnumerable<T> so you can't foreach over it).

For each IEnumerator<T> instance you get back, you'd create a new Task<TResult> (if you need a result), and in the Task<TResult> body, you would create the context and then cycle through the items returned by the IEnumerator<T>, calling the stored procedures in order.

That would look like this:

// Get the partitioner.
OrdinalPartitioner<MyItem> partitioner = Partitioner.Create(items);

// Get the partitions.
// You'll have to set the parameter for the number of partitions here.
// See the link for creating custom partitions for more
// creation strategies.
IList<IEnumerator<MyItem>> paritions = partitioner.GetPartitions(
    Environment.ProcessorCount);

// Create a task for each partition.
Task[] tasks = partitions.Select(p => Task.Run(() => { 
        // Create the context.
        using (var ctx = new MyDbContext())
        // Remember, the IEnumerator<T> implementation
        // might implement IDisposable.
        using (p)
        // While there are items in p.
        while (p.MoveNext())
        {
            // Get the current item.
            MyItem current = p.Current;

            // Call the stored procedures.  Process the item
            ctx.DoSomething1(current);
            ctx.DoSomething2(current);
        }
    })).
    // ToArray is needed (or something to materialize the list) to
    // avoid deferred execution.
    ToArray();

Solution 2

EF is not thread safe, so you cannot use Parallel.

Take a look at Entity Framework and Multi threading

and this article.

Solution 3

This is what I use and works great. It additionally supports handling of the error exceptions and has a debug mode which makes it far easier to track things down

public static ConcurrentQueue<Exception> Parallel<T>(this IEnumerable<T> items, Action<T> action, int? parallelCount = null, bool debugMode = false)
{
    var exceptions = new ConcurrentQueue<Exception>();
    if (debugMode)
    {
        foreach (var item in items)
        {
            try
            {
                action(item);
            }
            // Store the exception and continue with the loop.                     
            catch (Exception e)
            {
                exceptions.Enqueue(e);
            }
        }
    }
    else
    {
        var partitions = Partitioner.Create(items).GetPartitions(parallelCount ?? Environment.ProcessorCount).Select(partition => Task.Factory.StartNew(() =>
        {
            while (partition.MoveNext())
            {
                try
                {
                    action(partition.Current);
                }
                // Store the exception and continue with the loop.                     
                catch (Exception e)
                {
                    exceptions.Enqueue(e);
                }
            }
        }));
        Task.WaitAll(partitions.ToArray());
    }
    return exceptions;
}

You use it like the following where as db is the original DbContext and db.CreateInstance() creates a new instance using the same connection string.

        var batch = db.Set<SomeListToIterate>().ToList();
        var exceptions = batch.Parallel((item) =>
        {
            using (var batchDb = db.CreateInstance())
            {
                var batchTime = batchDb.GetDBTime();
                var someData = batchDb.Set<Permission>().Where(x=>x.ID = item.ID).ToList();
                //do stuff to someData
                item.WasMigrated = true; //note that this record is attached to db not batchDb and will only be saved when db.SaveChanges() is called
                batchDb.SaveChanges();        
            }                
        });
        if (exceptions.Count > 0)
        {
            logger.Error("ContactRecordMigration : Content: Error processing one or more records", new AggregateException(exceptions));
            throw new AggregateException(exceptions); //optionally throw an exception
        }
        db.SaveChanges(); //save the item modifications
Share:
46,219

Related videos on Youtube

m0fo
Author by

m0fo

Updated on September 28, 2021

Comments

  • m0fo
    m0fo over 2 years

    I have a list of IDs, and I need to run several stored procedures on each ID.

    When I am using a standard foreach loop, it works OK, but when I have many records, it works pretty slow.

    I wanted to convert the code to work with EF, but I am getting an exception: "The underlying provider failed on Open".

    I am using this code, inside the Parallel.ForEach:

    using (XmlEntities osContext = new XmlEntities())
    {
        //The code
    }
    

    But it still throws the exception.

    Any idea how can I use Parallel with EF? do I need to create a new context for every procedure I am running? I have around 10 procedures, so I think its very bad to create 10 contexts, one for each.

    • Matthew
      Matthew over 11 years
      I'm not a multithreaded guru, but if you're using transactions or your read/write gets locked, you may not get better performance than just doing it serially.
    • rene
      rene over 11 years
      I doubt hammering the stressed sql server with more threads will not help performance...
    • m0fo
      m0fo over 11 years
      The sql is not stressed at all, thats why I want to use a parallel. and it will surely run faster.
    • m0fo
      m0fo over 11 years
      On the first loop its crashing
    • Aron
      Aron about 9 years
      Why would you want to use .ToParallel() on EF? It won't improve performance.
    • Eric J.
      Eric J. over 8 years
      @IdoLazar: Good chance you would be wrong. For my particular application, I find that around 6 parallel tasks is ideal. Even that is limited by lock contention that the EF team is looking at
  • Claies
    Claies about 9 years
    you do know that this question was posted in 2012, right?
  • Roni Axelrad
    Roni Axelrad almost 5 years
    your last piece of code is brilliant, and you wrote it without even trying it, because of typos :) .. nice!
  • Gert Arnold
    Gert Arnold over 2 years
    It's not clear how this changed the scope of the context. What does the replacement look like with surrounding code? And why Context = null;?
  • Marc Roussel
    Marc Roussel over 2 years
    @Gert Arnold I can't explain why, I just gave a solution that solved my problem. Sorry for nulling the Context it's a old habit from me to always nulling my instance even with a Dispose. You may ommit it if you desire but it doesn't change a thing anyway. I wrote the whole answer to expose what solved the EF issue for me without knowing why exactly it solved it.
  • Gert Arnold
    Gert Arnold over 2 years
    Well, the way you show it here is pretty meaningless; using or Dispose doesn't make a (technical) difference.
  • Marc Roussel
    Marc Roussel over 2 years
    If what you say is true then why the result is correct without using ? Do you have any reasonable explanation for that ? The important thing in this is the fact that a problem occurs with EF during a Parallel.ForEach with using statement and works like a charm without it. I honestly don't know why but I just wanted to help others based on my issue.
  • Gert Arnold
    Gert Arnold over 2 years
    What I say (already in the first comment) is that without any wider context it's impossible to see how this would a difference.
  • Marc Roussel
    Marc Roussel over 2 years
    I agree. If I find more detail I'll come post it! Thanx for clarifying the unclear