How to use BULK INSERT when rows depend on foreign keys values?

16,364

The basic idea is to bulk insert your data into a staging table that doesn't have any restrictions, any constraints etc. - just bulk load the data as fast as you can.

Once you have the data in the staging table, then you need to start to worry about constraints etc. when you insert the data from the staging table into the real tables.

Here, you could e.g.

  • insert only those rows into your real work tables that match all the criteria (and mark them as "successfully inserted" in your staging table)

  • handle all rows that are left in the staging table that aren't successfully inserted by some error / recovery process - whatever that could be: printing a report with all the "problem" rows, tossing them into an "error bin" or whatever - totally up to you.

Key point is: the actual BULK INSERT should be into a totally unconstrained table - just load the data as fast as you can - and only then in a second step start to worry about constraints and lookup data and references and stuff like that

Share:
16,364
Will Marcouiller
Author by

Will Marcouiller

Scrum Master, Software Architect/Analyst, and Analyst-Programmer. I'm a passionate Software Development professional who strives to the best in every project. I consider every project a new challenge to take on, within which I deliver the best of myself. I'm a strong believer of Agile Methodologies, and as such, I'm always studying, testing, reading, "googling", "StackOverflowing" in order to stay up to date and offer the best software architectures out there, based on the best practices: SOLID. Lately, I have enlarged my knowledge over Dependancy Injection (DI) and Inversion of Control (IoC). It's interesting to learn and share knowledge among the SO community!

Updated on June 03, 2022

Comments

  • Will Marcouiller
    Will Marcouiller about 2 years

    My question is related to this one I asked on ServerFault.

    Based on this, I've considered the use of BULK INSERT. I now understand that I have to prepare myself a file for each entities I want to save into the database. No matter what, I still wonder whether this BULK INSERT will avoid the memory issue on my system as described in the referenced question on ServerFault.

    As for the Streets table, it's quite simple! I have only two cities and five sectors to care about as the foreign keys. But then, how about the Addresses? The Addresses table is structured like this:

    AddressId int not null identity(1,1) primary key
    StreetNumber int null
    NumberSuffix_Value int not null DEFAULT 0
    StreetId int null references Streets (StreetId)
    CityId int not null references Cities (CityId)
    SectorId int null references Sectors (SectorId)
    

    As I said on ServerFault, I have about 35,000 addresses to insert. Shall I memorize all the IDs? =P

    And then, I now have the citizen people to insert who have an association with the addresses.

    PersonId int not null indentity(1,1) primary key
    Surname nvarchar not null
    FirstName nvarchar not null
    IsActive bit
    AddressId int null references Addresses (AddressId)
    

    The only thing I can think of is to force the IDs to static values, but then, I lose any flexibility that I had with my former approach with the INSERT..SELECT stategy.

    What are then my options?

    1. I force the IDs to be always the same, then I have to SET IDENTITY_INSERT ON so that I can force the values into the table, this way I always have the same IDs for each of my rows just as suggested here.

    2. How to BULK INSERT with foreign keys? I can't get any docs on this anywhere. =(

    Thanks for your kind assistance!

    EDIT

    I edited in order to include the BULK INSERT SQL instruction that finally made it for me!

    I had my Excel workbook ready with the information I needed to insert. So, I simply created a few supplemental worksheet and began to write formulas in order to "import" the information data to these new sheets. I had one for each of my entities.

    1. Streets;
    2. Addresses;
    3. Citizens.

    As for the two other entities, it wasn't worthy to bulk insert them, as I had only two cities and five sectors (cities subdivisions) to insert. Once the both the cities and sectors inserted, I noted their respective IDs and began to ready my record sets for bulk insert. Using the power of Excel to compute the values and to "import" the foreign keys was a charm of itself, by the way. Afterwards, I have saved each of the worksheets to a separated CSV file. My records were then ready to bulked.

    USE [DatabaseName]
    GO
    
    delete from Citizens
    delete from Addresses
    delete from Streets
    
    BULK INSERT Streets
        FROM N'C:\SomeFolder\SomeSubfolder\Streets.csv'
        WITH (
            FIRSTROW = 2
            , KEEPIDENTITY
            , FIELDTERMINATOR = N','
            , ROWTERMINATOR = N'\n'
            , CODEPAGE = N'ACP'
        )
    GO
    
    • FIRSTROW

      Indicates the row number at which to begin the insert. In my situation, my CSVs contained the column headers, so the second row was the one to begin with. Aside, one could possibly want to start anywhere in his file, let's say the 15th row.

    • KEEPIDENTITY

      Allows one to bulk-insert specified in-file entity IDs even though the table has an identity column. This parameter is the same as SET INDENTITY_INSERT my_table ON before a row insert when you wish to insert with a precise id.

    As for the other parameters, they speak by themselves.

    Now that this is explained, the same code was repeated for each of the two remaining entities to insert Addresses and Citizens. And because the KEEPIDENTITY was specified, all of my foreign keys remained still, though my primary keys were set as identities in SQL Server.

    Only a few tweaks though, just the exact same thing as marc_s said in his answer, just import your data as fast as you can into a staging table with no restriction at all. This way, you're gonna make your life much easier, while following good practices nevertheless. =)

  • DaveE
    DaveE over 12 years
    Our app uses this exact strategy to load millions of rows at a time.
  • Will Marcouiller
    Will Marcouiller over 12 years
    This makes sense, after all. Data rows will always be easier to manipulate once they are inserted. Thanks for your grain of salt, Marc! =)
  • Will Marcouiller
    Will Marcouiller over 12 years
    I've finally successfully bulk-inserted the thousands of rows! As for the foreign keys, there is a special parameter KEEPIDENTITY, so one may specify the foreign key values in his CSV file and expect to have these foreign key values inserted.