What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

82,675

Solution 1

Here's how you can disable/enable indexes in SQL Server:

--Disable Index ALTER INDEX [IX_Users_UserID] SalesDB.Users DISABLE
GO
--Enable Index ALTER INDEX [IX_Users_UserID] SalesDB.Users REBUILD

Here are some resources to help you find a solution:

Some bulk loading speed comparisons

Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server

Optimizing Bulk Copy Performance

Definitely look into NOCHECK and TABLOCK options:

Table Hints (Transact-SQL)

INSERT (Transact-SQL)

Solution 2

You're already using SqlBulkCopy, which is a good start.

However, just using the SqlBulkCopy class does not necessarily mean that SQL will perform a bulk copy. In particular, there are a few requirements that must be met for SQL Server to perform an efficient bulk insert.

Further reading:

Out of curiosity, why is your index set up like that? It seems like ContainerId/BinId/Sequence is much better suited to be a nonclustered index. Is there a particular reason you wanted this index to be clustered?

Solution 3

My guess is that you'll see a dramatic improvement if you change that index to be nonclustered. This leaves you with two options:

  1. Change the index to nonclustered, and leave it as a heap table, without a clustered index
  2. Change the index to nonclustered, but then add a surrogate key (like "id") and make it an identity, primary key, and clustered index

Either one will speed up your inserts without noticeably slowing down your reads.

Think about it this way -- right now, you're telling SQL to do a bulk insert, but then you're asking SQL to reorder the entire table every table you add anything. With a nonclustered index, you'll add the records in whatever order they come in, and then build a separate index indicating their desired order.

Solution 4

Have you tried using transactions?

From what you describe, having the server committing 100% of the time to disk, it seems you are sending each row of data in an atomic SQL sentence thus forcing the server to commit (write to disk) every single row.

If you used transactions instead, the server would only commit once at the end of the transaction.

For further help: What method are you using for inserting data to the server? Updating a DataTable using a DataAdapter, or executing each sentence using a string?

Solution 5

BCP - it's a pain to set up, but it's been around since the dawn of DBs and it's very very quick.

Unless you're inserting data in that order the 3-part index will really slow things. Applying it later will really slow things too, but will be in a second step.

Compound keys in Sql are always quite slow, the bigger the key the slower.

Share:
82,675
Andrew
Author by

Andrew

Updated on December 16, 2020

Comments

  • Andrew
    Andrew over 3 years

    I am hitting some performance bottlenecks with my C# client inserting bulk data into a SQL Server 2005 database and I'm looking for ways in which to speed up the process.

    I am already using the SqlClient.SqlBulkCopy (which is based on TDS) to speed up the data transfer across the wire which helped a lot, but I'm still looking for more.

    I have a simple table that looks like this:

     CREATE TABLE [BulkData](
     [ContainerId] [int] NOT NULL,
     [BinId] [smallint] NOT NULL,
     [Sequence] [smallint] NOT NULL,
     [ItemId] [int] NOT NULL,
     [Left] [smallint] NOT NULL,
     [Top] [smallint] NOT NULL,
     [Right] [smallint] NOT NULL,
     [Bottom] [smallint] NOT NULL,
     CONSTRAINT [PKBulkData] PRIMARY KEY CLUSTERED 
     (
      [ContainerIdId] ASC,
      [BinId] ASC,
      [Sequence] ASC
    ))
    

    I'm inserting data in chunks that average about 300 rows where ContainerId and BinId are constant in each chunk and the Sequence value is 0-n and the values are pre-sorted based on the primary key.

    The %Disk time performance counter spends a lot of time at 100% so it is clear that disk IO is the main issue but the speeds I'm getting are several orders of magnitude below a raw file copy.

    Does it help any if I:

    1. Drop the Primary key while I am doing the inserting and recreate it later
    2. Do inserts into a temporary table with the same schema and periodically transfer them into the main table to keep the size of the table where insertions are happening small
    3. Anything else?

    -- Based on the responses I have gotten, let me clarify a little bit:

    Portman: I'm using a clustered index because when the data is all imported I will need to access data sequentially in that order. I don't particularly need the index to be there while importing the data. Is there any advantage to having a nonclustered PK index while doing the inserts as opposed to dropping the constraint entirely for import?

    Chopeen: The data is being generated remotely on many other machines (my SQL server can only handle about 10 currently, but I would love to be able to add more). It's not practical to run the entire process on the local machine because it would then have to process 50 times as much input data to generate the output.

    Jason: I am not doing any concurrent queries against the table during the import process, I will try dropping the primary key and see if that helps.