Multiple concurrent calls to SqlCommand.BeginExecuteNonQuery using same SqlConnection

c# sql-server multithreading ado.net

16,097

Solution 1

Well, at the extreme risk of receiving a lot of down votes I have to comment on this one. Firstly, this is a good question and well stated to address the specific potential issue you mentioned. However, you have neglected to discuss this "lengthy" process you're trying to accomplish.

My experience has thought me one thing...

If the question your asking is hard to answer, change the question.

Though I know very little of your specific problem, I think this neatly applies to your predicament. As others have mentioned... Temp tables are nasty, creating your own tables for a specific task is nastier still, updating large quantities of data in SQL is expensive.

Ask yourself "Can you avoid it all?"

Quite often people choose to implement extremely complicated logic in databases because they have a belief that SQL can do it faster. Practically this is a flawed concept, Databases are storage/serialization devices, they are good at storing, updating, locating, and synchronizing access to data. They are not well equipped for processing complex operations. Even after Microsoft's (and others) bastardization of the database by injecting full development languages into it, It cannot perform as optimally as a well written client (*depending on the complexity of the operations, which I suspect you have surpassed).

As an example, you have a database of around 2gb worth of raw data. You want to produce a complex report or analysis on the entire data set. Well simply put 2gb of memory is easy to come by, slurp the entire database (or the portion you need) into memory using dictionaries or whatever to create the look-ups you will need. Depending on several factors the whole thing will likely run several times faster than SQL, can easily be unit tested, and will be (IMHO) significantly easier to build, debug, and maintain than a nasty assortment of SPROCs constructing dynamic SQL. Even with more than 2gb of raw data, client caches can easily be created with several existing technologies (B-Trees, ISAM, or the like).

The product I work on today has 2.4tb of data in the database and we have not a single sproc, join statement, or even a non-equality where clause.

But alas my advice may or may not be pertinent to your specific circumstances since I do not know your objectives or constraints. Hopefully, if nothing else, it will make you ask yourself:

"Am I asking the right question?"

Solution 2

You could use a producer-consumer pattern with 2 threads and 2 simultaneous but independent sql connections.

The producer (1st thread) has the DataReader (1st sql connection) and writes its results to a blocking queue. The consumer (2nd thread) reads from the queue, has the ExecuteNonQuery (2nd sql connection) and writes to the temp table(s).

Another idea in case your ExecuteNonQuery commands are basically multiple INSERTs: ExecuteNonQuery has an overload with a StringCollection to send multiple sql statements as one operation.

Solution 3

There can be only one DataReader associated with a Command object, and there can be numerous Command objects associated with the same connection. The only thing you can't do here is use the same command with different parameters.

However, when you start a database transaction (implicit if not explicit), the resources associated with that transaction are locked until the transaction is committed or rolled back, and all processes who want to query those resources are put in a queue. SQL Server manages queues pretty well. I had a few problems with deadlocks because of high server load in SQL Server 2000, but there were no such problems with later versions.

It is strange that you actually received a performance improvement. This makes me think you have a very large amount of data, which takes time to process when sending to SQL Server. When transmitting chunks, less time is consumed because data transmission and data processing is performed concurrently.

Anyway, there shouldn't be any problem with that.

However, consider using CLR Assemblies (if this option is available) to process the information directly in the database engine, without TCP traffic.

Solution 4

Yes, really good question.

Perhaps you could use a feature that was introduced in SQL Server 2005, called MARS: http://msdn.microsoft.com/en-us/library/ms345109(v=sql.90).aspx

MARS allows for reusing the same connection for reads and writes, yet it has some limitations, and frankly, I don't know anyone who would use it.

From what I see though, maybe it would be possible to look at your problem from a different point. Maybe, instead of using temp tables and having to keep an eye on them through the whole process, which in the end has to be synchronous - maybe you can create a set of permanent tables which will contain additional column JobId. Then you are not constrained to single thread. You can have a table that will keep history of the jobs. Once you insert a row to this table, you retrieve scope_identity() and add it to all elements of your algorithm. The tables could hold more than one copy of the results at a time and any queries that read or update the data would use the JobId as set identifier. If you index the tables correctly, you will have very smooth design that will be far more scalable than the solution you are trying to implement now.

Regards

Piotr

Solution 5

That is a question if running data modification statements helps you. MARS is acronym of multiple active result sets - result set is outcome of SELECT or FETCH statements and in .NET it generally means that you can have multiple DataReaders opened on the same connection. But any data modification operation is considered as atomic and it must complete before other operation can be executed (or data retrieval from result set can continue) - read about it here. So I think that your asynchronous commands are at the and still executed sequentially.

You can use multiple connections if your main connection creates global temporary table ##TempName instead of #Temp. Global temporary table should be visible from other session while the main session is still active.

View more solutions

16,097

Author by

robertburke

I have worked several years with C++, C#, ActionScript, JavaScript, etc.

Updated on June 02, 2022

Comments

robertburke almost 2 years

I have some working C# code that uses a SqlConnection to create temp tables (e.g., #Foo), call stored procs to fill those temp tables and return results to the C# client, use c# to perform complex calculations on those results, and use the calculation results to update one of the temp tables created earlier.

Because of the temp tables used throughout the process, we must have only one SqlConnection.

I identified a performance bottleneck in updating the temp table with the calculation results. This code was already batching the updates to prevent the C# client from running out of memory. Each batch of calculated data was sent to a stored proc via SqlCommand.ExecuteNonQuery, and the sproc in turn updates the temp table. The code was spending most of its time in this call to ExecuteNonQuery.

So, I changed it to BeginExecuteNonQuery, along with the code to wait on the threads and call EndExecuteNonQuery. This improved performance by about a third, but I am worried about having multiple concurrent calls to SqlCommand.BeginExecuteNonQuery using the same SqlConnection.

Is this OK, or will I run into threading problems?

Sorry for the long explanation.

The MSDN docs state:

The BeginExecuteNonQuery method returns immediately, but until the code executes the corresponding EndExecuteNonQuery method call, it must not execute any other calls that start a synchronous or asynchronous execution against the same SqlCommand object.

This seems to imply that different SqlCommand objects can call BeginExecuteNonQuery before the first SqlCommand completes.

Here is some code that illustrates the issue:

    private class SqlCommandData
    {
        public SqlCommand Command { get; set; }
        public IAsyncResult AsyncResult { get; set; }
    }

    public static void TestMultipleConcurrentBeginExecuteNonQueryCalls(string baseConnectionString)
    {
        var connectionStringBuilder = new SqlConnectionStringBuilder(baseConnectionString)
                                          {
                                              MultipleActiveResultSets = true,
                                              AsynchronousProcessing = true
                                          };
        using (var connection = new SqlConnection(connectionStringBuilder.ConnectionString))
        {
            connection.Open();

            // ELIDED - code that uses connection to do various Sql work

            SqlDataReader dataReader = null;
                // in real code, this would be initialized from calls to SqlCommand.ExecuteReader, using same connection

            var commandDatas = new List<SqlCommandData>();
            var count = 0;
            const int maxCountPerJob = 10000;
            while (dataReader.Read())
            {
                count++;
                // ELIDED - do some calculations on data, too complex to do in SQL stored proc
                if (count >= maxCountPerJob)
                {
                    count = 0;
                    var commandData = new SqlCommandData
                                          {
                                              Command = new SqlCommand {Connection = connection}
                                          };
                    // ELIDED - other initialization of command - used to send the results of calculation back to DB
                    commandData.AsyncResult = commandData.Command.BeginExecuteNonQuery();
                    commandDatas.Add(commandData);
                }
            }
            dataReader.Close();

            WaitHandle.WaitAll(commandDatas.Select(c => c.AsyncResult.AsyncWaitHandle).ToArray());
            foreach (var commandData in commandDatas)
            {
                commandData.Command.EndExecuteNonQuery(commandData.AsyncResult);
                commandData.Command.Dispose();
            }

            // ELIDED - more code using same SqlConnection to do final work

            connection.Close();
        }
    }

robertburke almost 13 years

Thanks, we are actually using MARS (see code above). I have been looking at SqlConnection with ILSpy, and it is certainly not thread-safe, so I will probably have to change from using a single connection. What I may do is create a unique table name for a global temp table, like ##Foo_{Guid}, and pass the table name to my procs. Then, I can open new SqlConnections and still access the global temp table, which will go away when the last SqlConnection is closed. The global temp tables are theoretically visible to everyone, but only my c# code knows the table name.
Piotr Rodak almost 13 years

Indeed, I didn't notice. Why don't you go for permanent tables?
robertburke almost 13 years

Global temp tables clean up automatically (although I will probably drop them when done with my process) and I think I should get close or the same performance as permanent tables. I am trying incremental steps and trying not to change the database design much.
Piotr Rodak almost 13 years

Forgive me, but I am not convinced this is the way to go :) While it will work, it will still be awkward to manage - you will have to ensure that you have at least one connection open so they are not deleted. If you add guid to the table names, it means you will have to use dynamic sql to perform operations on the tables, which may have negative impact on the performance. If you receive new requirement to store the data for audit/reporting reasons, you will have to copy them to permanent tables and write separate code to read from these tables. These are just a few reasons I see at the moment.
robertburke almost 13 years

Thanks for the comments. I understand the concerns. In my case, the tables are truly temporary and are just used for some intermediate steps in a long chain of calculations.
robertburke almost 13 years

Thanks, what you described is actually what I am doing, i.e., using multiple threadpool threads. For the purpose of this question, I simplified to use BeginExecuteNonquery many times from one thread. In either case, I am using one SqlConnection, and that connection is being accessed by multiple threads, which is almost certainly a bad thing.
robertburke almost 13 years

For example, I saw with ILSpy that there is an internal property on SqlConnection that holds the current SqlCommand, and SqlCommand sets this property to this as part of BeginExecuteNonQuery, and to null in EndExcuteNonQuery. Things like this will hose the state of the SqlConnection when SqlCommands share the same SqlConnection and run concurrently and are interleaved.
Davita almost 13 years

I wish I could upvote your answer 100 times. I received -5 for one of my answer, similar to yours :-)
oleschri almost 13 years

Btw. be careful not to deadlock yourself accessing the same objects from each of your connections!
csharptest.net almost 13 years

@Davita Thx, and yes answers like this are often beat down ;) I wonder what the OP will do...
gordy almost 13 years

A database most certainly is not for storage/serialization - if that's all you're using it for then yes, as you say, you're probably better off with your own datastructures. You mention B-Trees and ISAM so you must be aware that an RDBMS is simply a host of efficient implementations of those as well as a myriad of others (heaps, bst, tst, hashes, etc.) and if your complex problem can be reduced to a one or more of those things then the RDBMS is actually probably your best shot at solving it optimally.
csharptest.net almost 13 years

@Gordy In a effort to steer away from the religious debate of what a database is and is not use for I will admit RDBMS can solve some problems with great efficiency. Still there are more things it is not good at. The problem is people don't know when to stop. Business logic gets 'sucked' into SQL like a black hole sucking the light out of the universe. I'm not talking about a select statement, or a group by, or even a single SQL statement. I'm talking about crap like this: bit.ly/na3cSi