SqlBulkCopy performance

c# sql-server sqlbulkcopy

15,010

Why not use SSIS directly?

Anyway, if you did a treaming from parsing to IDataReader you're already on the right path. To optimize SqlBulkCopy itself you need to turn your focus to SQL Server. The key is minimally logged operations. You must read these MSDN articles:

If your target is a B-Tree (ie a clustered indexed table) unfortunately one of the most important tenets of performant bulk insert, namely the sorted-input rowset, cannot be declared. Sis simple as this, ADO.Net SqlClient does not have the equivalent of SSPROP_FASTLOADOPTIONS -> ORDER(Column) (OleDb). Since the engine does not know that the data is already sorted it will add a Sort operator in the plan which is not that bad except when it spills. To avoid spills, use a small batch size (~10k). See my original point: all these are just options and clicks to set in SSIS rather than digging through OleDB MSDN spec...

If your data stream is unsorted to start with or the destination is a heap then my point above is mute.

However, achieving minimally logging is still a must for decent performance.

15,010

Author by

Michael S

Updated on June 23, 2022

Comments

Michael S about 2 years
I am working to increase the performance of bulk loads; 100's of millions of records + daily.

I moved this over to use the IDatareader interface in lieu of the data tables and did get a noticeable performance boost (500,000 more records a minute). The current setup is:
- A custom cached reader to parse the delimited files.
- Wrapping the stream reader in a buffered stream.
- A custom object reader class that enumerates over the objects and implements the IDatareader interface.
- Then SqlBulkCopy writes to server
The bulk of the performance bottle neck is directly in SqlBulkCopy.WriteToServer. If I unit test the process up to but excluding just the WriteToServer the process returns in roughly 1 minute. WriteToServer is taking on an additional 15 minutes +. For the unit test it is on my local machine so the same drive the database lives on so it's not having to copy the data across the network.

I am using a heap table (no indexes; clustered or unclustered; I have played around various batch sizes without major differences in performance).

There is a need to decrease the load times so I am hoping someone might now a way to squeeze a little more blood out of this turn-up.