When should I call SaveChanges() when creating 1000's of Entity Framework objects? (like during an import)

38,903

Solution 1

I would test it first to be sure. Performance doesn't have to be that bad.

If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.

Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.

Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).

EDIT

I've done some tests (time in miliseconds):

10000 rows:

SaveChanges() after 1 row:18510,534
SaveChanges() after 100 rows:4350,3075
SaveChanges() after 10000 rows:5233,0635

50000 rows:

SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765

So it is actually faster to commit after n rows than after all.

My recommendation is to:

  • SaveChanges() after n rows.
  • If one commit fails, try it one by one to find faulty row.

Test classes:

TABLE:

CREATE TABLE [dbo].[TestTable](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [SomeInt] [int] NOT NULL,
    [SomeVarchar] [varchar](100) NOT NULL,
    [SomeOtherVarchar] [varchar](50) NOT NULL,
    [SomeOtherInt] [int] NULL,
 CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Class:

public class TestController : Controller
{
    //
    // GET: /Test/
    private readonly Random _rng = new Random();
    private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private string RandomString(int size)
    {
        var randomSize = _rng.Next(size);

        char[] buffer = new char[randomSize];

        for (int i = 0; i < randomSize; i++)
        {
            buffer[i] = _chars[_rng.Next(_chars.Length)];
        }
        return new string(buffer);
    }


    public ActionResult EFPerformance()
    {
        string result = "";

        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
        TruncateTable();
        result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
        TruncateTable();

        return Content(result);
    }

    private void TruncateTable()
    {
        using (var context = new CamelTrapEntities())
        {
            var connection = ((EntityConnection)context.Connection).StoreConnection;
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"TRUNCATE TABLE TestTable";
            command.ExecuteNonQuery();
        }
    }

    private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
    {
        var startDate = DateTime.Now;

        using (var context = new CamelTrapEntities())
        {
            for (int i = 1; i <= noOfRows; ++i)
            {
                var testItem = new TestTable();
                testItem.SomeVarchar = RandomString(100);
                testItem.SomeOtherVarchar = RandomString(50);
                testItem.SomeInt = _rng.Next(10000);
                testItem.SomeOtherInt = _rng.Next(200000);
                context.AddToTestTable(testItem);

                if (i % commitAfterRows == 0) context.SaveChanges();
            }
        }

        var endDate = DateTime.Now;

        return endDate.Subtract(startDate);
    }
}

Solution 2

I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.

I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.

By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.

I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.

This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.

Solution 3

If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.

Solution 4

Sorry, I know this thread is old, but I think this could help other people with this problem.

I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.

// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
    // Delete invalid User or Change
    var chUser  =  (db_User) ch.Entry.Entity;
    if (chUser.LastUpdated == null)
    {
        // Invalid, new User
        _userDatabase.db_User.Remove(chUser);
        Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
    }
    else
    {
        // Invalid Change of an Entry
        _userDatabase.Entry(chUser).Reload();
        Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
    }                    
}

_userDatabase.SaveChanges();

Solution 5

Use a stored procedure.

  1. Create a User-Defined Data Type in Sql Server.
  2. Create and populate an array of this type in your code (very fast).
  3. Pass the array to your stored procedure with one call (very fast).

I believe this would be the easiest and fastest way to do this.

Share:
38,903
John B
Author by

John B

I'm a Web Developer working... um... here! YES, STACK OVERFLOW!!! I've been on the Ads Dev Team for about a year, and was on the Internal Dev Team for a few years before that. I specialize in C# and JavaScript but I dabble in many other languages. Check out my blog: Johnny Code Check out Stuff My Kids Said

Updated on June 14, 2020

Comments

  • John B
    John B almost 4 years

    I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:

    Which of these makes the most sense:

    1. Run SaveChanges() every AddToClassName() call.
    2. Run SaveChanges() every n number of AddToClassName() calls.
    3. Run SaveChanges() after all of the AddToClassName() calls.

    The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.

    I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.

    The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?

  • John B
    John B over 14 years
    I hate it when people don't answer my question :) Well, let's say I "need" to use EF. What then?
  • marc_s
    marc_s over 14 years
    Well, if you really MUST use EF, then I would try to commit after a batch of say 500 or 1000 records. Otherwise, you'll end up using too much resources, and a failure would potentially roll back all 99999 rows you've updated when the 100000th one fails.
  • John B
    John B over 14 years
    The reason I wrote "probably" is that I made an educated guess. To make it more clear that "I'm not sure", I made it into a question. Also, I think it makes complete sense to think about potential problems BEFORE I run into them. That is the reason I asked this question. I was hoping someone would know which method would be most efficient, and I could go with that, right off the bat.
  • John B
    John B over 14 years
    Awesome dude. Exactly what I was looking for. Thank you for taking to time to test this! I'm guessing that I can store each batch in memory, try the commit, and then if it fails go through each one individually as you said. Then once that batch is done, release the references to those 100 items so they can be cleared out of memory. Thanks again!
  • LukLed
    LukLed over 14 years
    Memory will not be freed, because all objects will be held by ObjectContext, but having 50000 or 100000 in context doesn't take much space these days.
  • Julien N
    Julien N almost 14 years
    With the same issue, I ended by using SqlBulkCopy which is way more performant that EF in that case. Although I don't like to use several ways to access database.
  • Dennis Ward
    Dennis Ward over 13 years
    I am also looking into this solution as I have the same problem... Bulk copy would be an excellent solution, but my hosting service disallows use of it (and I would guess that others would too), so this isn't a viable option for some people.
  • Eric J.
    Eric J. over 11 years
    @marc_s: How do you handle the need to enforce business rules inherent in business objects when using SqlBulkCopy? I don't see how to not use EF without redundantly implementing the rules.
  • Shawn de Wet
    Shawn de Wet over 10 years
    I've actually found that the performance degrades between each call to SaveChanges(). The solution to this is to actually dispose the context after each SaveChanges() call, and re-instantiate a new one for the next batch of data to be added.
  • LukLed
    LukLed over 10 years
    @ShawndeWet: If you look at the test class, this is what I actually did.
  • Shawn de Wet
    Shawn de Wet over 10 years
    @LukLed not quite...you're calling SaveChanges inside your For loop...so the code could carry on adding more items to be saved inside the for loop on the same instance of ctx and call SaveChanges later again on that same instance.
  • Michael Blackburn
    Michael Blackburn almost 8 years
    Typically on SO, claims of "this is fastest" need to be substantiated with test code and results.
  • Admin
    Admin almost 7 years
    Yes, it's about the same problem, right? With this, you can add all 1000 records and before you run saveChanges()you can delete the ones which would cause an Error.
  • Gert Arnold
    Gert Arnold almost 7 years
    But the emphasis of the question is on how many inserts/updates to commit efficiently in one SaveChanges call. You don't address that problem. Note that there are more potential reasons for SaveChanges to fail than validation errors. By the way, you can also just mark entities as Unchanged instead of reloading/deleting them.
  • Admin
    Admin almost 7 years
    You're right, it doesn't directly address the question, but I think most of the people stumble over this thread are having the problem with the validation, although there are other reasons SaveChanges fails. And this solves the problem. If this post really disturbs you in this thread I can delete this, my problem is solved, I just trying to help others.
  • Foyzul Karim
    Foyzul Karim over 6 years
    Hi, @eric-j could you please slightly elaborate this line "by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection)" so that I can try with my code?
  • Jeancarlo Fontalvo
    Jeancarlo Fontalvo over 4 years
    I have a question about this one. When you call GetValidationErrors() does it "fake" a call to database and retrieves errors or what? Thanks for replying :)
  • Jeancarlo Fontalvo
    Jeancarlo Fontalvo over 4 years
    Awesome answer Mr. Exactly what I needed, but I was just wondering about how can we save individual entries when batch fails? Could you explain the implementation about that? Thanks :)