BCP/ Bulk Insert Fails (tab delimited file)

21,264

Solution 1

Just Saved your file as .CSV and bulk inserted with the following statement.

BULK INSERT dim_assessment FROM 'C:\Blabla\TestFile.csv'
WITH (
  FIRSTROW = 2,
  FIELDTERMINATOR = ',',
  ROWTERMINATOR = '\n'
);
GO

Returned Message

(22587 row(s) affected)

Loaded Data

enter image description here

Just notice that some data from ICD name has overflown into LoadDate Column, Just use the | pipe character to deliminate and use the same bulk insert statement with FIELDTERMINATOR = '|' and happy days .

Solution 2

Your input file is in a terrible format.

Your format file and your BULK INSERT command both state that the end of a row should be a carriage return and line feed combination, and that there are seven columns of data. However if you open your CSV file in Notepad you will quickly see that the carriage returns and line feeds are not observed correctly in Windows (meaning they must be something other than precisely \r\n). You can also see that there aren't actually seven columns of data, but five:

QueryType   QueryDate   APUID   AssessmentID    ICDCode ICDName LoadDate
PPIC    2013-11-20 10:23:14 11431   10963       Tremors
PPIC    2013-11-20 10:23:14 11431   11299       THUMB PAIN
PPIC    2013-11-20 10:23:14 11431   11348       Environmental allergies
...

Just looking at it visually you can tell it isn't right, and you need to get a better source file before throwing it over the wall at SQL Server and expecting it to handle it smoothly:

enter image description here

Share:
21,264
DataRiver
Author by

DataRiver

Updated on December 12, 2020

Comments

  • DataRiver
    DataRiver over 3 years

    I have been trying to import data (tab delimited) into SQL server. The source data is exported from IBM Cognos. Data can be downloaded from: sample data

    I have tried BCP / Bulk Insert, but it did not help. The original datafile contains a header row (which needs to be skipped).

    ================================== Schema:

    CREATE TABLE [dbo].[DIM_Assessment](
    [QueryType] [nvarchar](4000) NULL,
    [QueryDate] [nvarchar](4000) NULL,
    [APUID] [nvarchar](4000) NULL,
    [AssessmentID] [nvarchar](4000) NULL,
    [ICDCode] [nvarchar](4000) NULL,
    [ICDName] [nvarchar](4000) NULL,
    [LoadDate] [nvarchar](4000) NULL
    ) ON [PRIMARY]
    GO
    

    ============================= Format File generated using the following command

    bcp [dbname].dbo.dim_assessment format nul -c -f C:\config\dim_assessment.Fmt -S <IP>  -U sa -P Pwd
    

    Content of the format file:

    11.0
    7
    1       SQLCHAR             0       8000    "\t"     1     QueryType                    SQL_Latin1_General_CP1_CI_AS
    2       SQLCHAR             0       8000    "\t"     2     QueryDate                    SQL_Latin1_General_CP1_CI_AS
    3       SQLCHAR             0       8000    "\t"     3     APUID                        SQL_Latin1_General_CP1_CI_AS
    4       SQLCHAR             0       8000    "\t"     4     AssessmentID                 SQL_Latin1_General_CP1_CI_AS
    5       SQLCHAR             0       8000    "\t"     5     ICDCode                      SQL_Latin1_General_CP1_CI_AS
    6       SQLCHAR             0       8000    "\t"     6     ICDName                      SQL_Latin1_General_CP1_CI_AS
    7       SQLCHAR             0       8000    "\r\n"   7     LoadDate                     SQL_Latin1_General_CP1_CI_AS
    

    =============================

    I tried importing data using BCP / Bulk Insert, however, non of them worked.

    bcp [dbname].dbo.dim_assessment IN C:\dim_assessment.dat -f C:\config\dim_assessment.Fmt -S <IP>  -U sa -P Pwd
    
    BULK INSERT dim_assessment FROM '\\dbserver\DIM_Assessment.dat'
    WITH (
      DATAFILETYPE = 'char',
      FIELDTERMINATOR = '\t',
      ROWTERMINATOR = '\r\n'
    );
    GO
    

    Thank you in advance for your help@

  • DataRiver
    DataRiver over 10 years
    It is probably not feasible to add the delimiter (pipe).
  • DataRiver
    DataRiver over 10 years
    Wow, I am excited to see this. When you say there should be an extra step, is there a way to save the tab delimited file as a "comma" delimited file in excel, and then, import the data from commandline? If that happens, i am gold!
  • DataRiver
    DataRiver over 10 years
    Would you know of a way to save this tab delimited file with "pipes" from command line or something which will not require manual intervention?
  • DataRiver
    DataRiver over 10 years
    I dont know what to say. Probably as others have explained, some columns are missing/ blank, which is why you are seeing the datafile in this bad format?
  • Adir D
    Adir D over 10 years
    The file was already labeled ".csv" but in tab-delimited format. Do you mean you opened it manually in Excel, and then re-saved it as CSV? Unfortunately you can't do that manually every time. The real fix is to get proper output in the first place, so you don't have to perform manual steps in between...
  • carveone
    carveone over 10 years
    You should give this a try through the 'save as' option of Excel. If you can do it manually, it means there's a way to do it programmatically. plain T-SQL might not do the trick, but there is surely some command-line software available. Just google it.
  • DataRiver
    DataRiver over 10 years
    I am searching! when i find something, will post it here. Thanks
  • DataRiver
    DataRiver over 10 years
    ok, so i am able to save the file as "csv" using vba. When I import the file using bulk import, it gets imported to the server. However, there is a small problem. data in the file (when opened in notepad) shows up: PPIC,11/20/2013 10:23,11431,10963,,Tremors , PPIC,11/20/2013 10:23,11431,11592,,"Glioblastoma, Barin ", the first row is inserted correctly. second row inserts, however, the last column is saved into multiple columns (i believe it is because of the extra comma) How can I fix that?
  • carveone
    carveone over 10 years
    If you are using VBA, then I do advise you to go straight with Excel objects and tools and save the file as XLS (or try XML?). This will avoid this coma problem. You can then either go on with VBA and open a recordset based on your XLS file, or switch to an import procedure through SQL Server.
  • carveone
    carveone over 10 years
    Another "greasy" solution would be to open your text file and change the coma into another character when the coma is between two quotes ... you'll then be able to get a clean csv file and bulk import it....
  • DataRiver
    DataRiver over 10 years
    ok, so i managed to get the source data (excel csv delimited) to a pipe delimited using windows powershell. Now, the format file has been updated to use "|" instead of "\t". While trying to do a bcp, I see the error Starting copy... SQLState = S1000, NativeError = 0 Error = [Microsoft][SQL Server Native Client 11.0]Unexpected EOF encountered in BCP data-file 21 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 78 Average : (269.23 rows per sec.) Any help?
  • carveone
    carveone over 10 years
    There is no specific csv format for Excel. If you manage the file through Excel, save it as xml or xlsx. Don't leave it as csv ....
  • DataRiver
    DataRiver over 10 years
    @AaronBertrand... so, I have got the data saved as a pipe delimited (all programmatically). So far, I have an issue in saving the data via bcp. I am getting the following error while trying to issue a bcp: Starting copy... SQLState = S1000, NativeError = 0 Error = [Microsoft][SQL Server Native Client 11.0]Unexpected EOF encountered in BCP data-file 21 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 78 Average : (269.23 rows per sec.)
  • DataRiver
    DataRiver over 10 years
    It says 21 rows copied, however, all rows, except first row, first column has some junk values. I am at a loss on what is going on here. It is a simple BCP input with pipe delimiters being specified.