How do you import UTF-8 flat files into SQL Server 2008 R2?

27,844

Solution 1

Not true, you simply need to choose code page 65001

enter image description here

Solution 2

  1. convert your data file to UTF-16 Little Endian (exactly Little Endian)
  2. use bcp with -w option.

Solution 3

Because it didn't work at first I want to add to Arthur's answer, as mentioned in the comments by live-love: You should change the string data types to NVARCHAR. You do can do this by selecting Unicode string(DT_WSTR) from the Advanced tab and the specified columns.

enter image description here

Solution 4

Just for reference, if someone google it, and falls here like me.


I've tried the accepted answer a dozen times, with no success. In my case, my data file was a .csv flat file, which had a lot of accents characters/letters, like ç é ã á.

I also noted that no matter what encoding I choose, the import was made using the 1251 (ANSI - Latin 1) encoding.

So, the solution was convert before import, my .csv file from UTF-8 to the very same 1251 (ANSI - Latin 1) encoding. I did the conversion using Notepad++.

After converting it, did the regular import (through SSMS Tasks -> "Import Data" wizard), selecting the 1251 (ANSI - Latin 1) encoding, and everything was imported correctly.


Environment:

SQL Server Web 2016

SQL Server Management Studio v17.9.1

Notepad++ v7.7.1


Also, this answers too the original OP's question:

Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

Share:
27,844
Fastidious
Author by

Fastidious

Newbie developer trying to learn.

Updated on July 09, 2022

Comments

  • Fastidious
    Fastidious almost 2 years

    I have a bunch of UTF-8 encoded flat files that need to be imported into a SQL Server 2008 R2 database. Bulk inserts are not able to identify the diameters nor seems to accept UTF-8.

    I understand that there is a number of articles on how SQL Server 2008 deals with UTF-8 encoding, but I'm sort of looking for any updated answers as most of those articles are old.

    Is there anything I can to do in order to get these flat files into the database either by converting them before an insert or a process to run during the insert?

    I want to stay away from manually converting each one. Furthermore, SSIS packages that I've attempted to create can read and separate the data. It just can't move the data it seems. :(

    The flat files are generated by Java. Converting the java environment from UTF-8 to any other encoding has been unsuccessful.

    NOTE

    I have no intention of storing UTF-8 data. My delimiter is coming in funky because it's UTF-8. SQL Server cannot read the characters when separating the columns and rows. That's it.

  • Fastidious
    Fastidious over 10 years
    Is there a solution that does this for all files at once? Maybe something not tied to SQL Server? I have no interest in storing UTF-8 characters. It's just my delimiter is UTF-8 therefore it can't separate the columns or rows.
  • Michael-O
    Michael-O over 10 years
    How can a delimiter be a character encoding? Maybe I am missing something. I have already layed out in the comments of my mentioned answers that you can either use iconv or directly write UTF-16 files in Java, simply supply the encoding to your PrintWriter.
  • Fastidious
    Fastidious over 10 years
    I've tried setting the environment in Java to write in something else other than UTF-8 both with passing the environmental settings and defining it within Maven. I've even set the environment variables in Windows (which is the environment I'm using for Java) to pass the same environmental settings to Java. All have failed. As for Unix, I have no access to Linux or Unix environments to convert if iconv is restricted to *inx.
  • Fastidious
    Fastidious over 10 years
    I did make a SSIS package last night where I set the source file as UTF-8. The package could read the data, but the delimiter was garbage. So I defined the delimiter as the garbage characters SQL was picking up on. It separated the columns and all other data (what I was importing) was clean. I imported that directly into the database using SSIS package with no problems. If I try to convert the data from UTF-8 to anything else with SSIS, it errors.
  • live-love
    live-love about 4 years
    Make sure you select NVARCHAR when you edit the mappings.
  • Fiach Reid
    Fiach Reid about 2 years
    I followed a similar approach, but converted the file to UTF-16-LE-BOM in Notepad++ and it finally imported with the right encoding. (Cyrillic text)