Excel to CSV with UTF8 encoding
Solution 1
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
Solution 2
I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.
In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.
Solution 3
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
Solution 4
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
Solution 5
You can use iconv command under Unix (also available on Windows as libiconv).
After saving as CSV under Excel in the command line put:
iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
Jeff Treuting
Long time developer/consultant and co-creator of SharpRepository. Currently working at one of the best places for developers to call home, Fairway Technologies.
Updated on July 28, 2022Comments
-
Jeff Treuting over 1 year
I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.
Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.
-
Jeff Treuting over 13 yearsI'm sure this works, I just don't happen to have OpenOffice already so Google Docs was easier in my situation. but thanks for the suggestion
-
charlax about 12 yearsThanks! It's a simple solution, far better that struggling with Excel.
-
ax. over 11 years"Unicode Text" saves in UTF-16 LE (Little Endian), not in UTF-8 as asked by the OP.
-
Olivier 'Ölbaum' Scherler over 11 yearsThe question was specifically about UTF-8.
-
nevets1219 over 11 yearsAdded another possibly solution (untested) without having to use Google Spreadsheet.
-
Christian Davén over 11 yearsOpenOffice Calc has more and better options when opening and saving "CSV" files (field separators, encoding etc) than both Google Docs and Excel. Also, Google Docs currently suffers from a limit of 400,000 cells per spreadsheet, which OpenOffice Calc does not.
-
Zane about 11 yearsNo idea why you were downvoted. Notepad++ did it for me. Can't store my file in Google Spreadsheet as it is confidential.
-
mcNux almost 11 yearsYes, but it is the best way to get Unicode support for your x-separated-values in Excel. I've had all sorts of issues trying to get Excel to play ball with UTF-8! Read More
-
Indolering almost 11 yearsPerhaps Joel Slotsky (former PM of Excel) could send them his decade-old post on The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets?
-
nikolas almost 11 yearsthat worked for me.. had problem with greek characters when exportedmysql db as csv and imported it to excel..
-
user56reinstatemonica8 over 10 yearsFor me, on Excel for Mac 2011, this works but only if I choose
Windows comma separated (CSV)
. It doesn't work if I use the default or DOS CSV options - both these replace the accented characters with random junk characters. Tested for characters includingé
,è
,â
... Don't know if it's real UTF8 but the characters aren't mangled. -
user56reinstatemonica8 over 10 yearsQuick confirmation - the files produced with this method on (Excel for Mac 2011) do not produce UTF-8 csvs, BUT, they do produce CSVs that at least contain the correct characters and can therefore can be painlessly converted to UTF8 in a text editor, which is a big step forward from the ludicrous mangled junk that Excel spits out by default.
-
cbmanica over 10 yearsYes, agreed, this worked for me as well (Excel Mac 2011) and it really deserves more upvotes.
-
Fer over 10 yearsWith the normal Windows Notepad (using save as, and then choosing utf-8 in the encoding option) worked for me. For me that's the best approach for me since that must be done by users that have no admin rights in their machines, so installing additional software is not required.
-
Robert Cutajar over 10 yearsOpenOffice JustWorks(tm) with no hassle - I think it should be included
-
nevets1219 over 10 yearsI have no experience with OpenOffice but Yessus has submitted something in regards to OpenOffice - if it's not the same thing you should add it as answer
-
Mattias Lindberg about 10 yearsFinally something that worked! Tried the above Excel options on Excel 2013 without success. I just switched to using \t as split character when I parsed it and it worked perfect!
-
jameshfisher about 10 years"Utf8 is a subset of Unicode": this makes no sense. UTF-8 is an encoding of Unicode.
-
Avatar almost 10 yearsNeeded to export an XLS as CSV to import in MySQL. Using Excel 2003 I exported in format "Unicode Text (.txt)", then used Notepad++ to replace the TAB with
;
, then imported the txt-file into phpmyadmin with default "Character set of the file: utf-8", Format "CSV using LOAD DATA". All encoding was transferred correctly. -
elomage almost 10 yearsFor smaller files and simple data I use this shortcut: select the area or columns that I want to export, then copy (Ctrl+C). Then go to my text editor (for example, Gedit in Ubuntu) and paste. The text editor has all the data as TAB delimited. Then save as a text file, usually in UTF8. The final format may depend on your text editor configuration.
-
Sebastian over 9 yearsThat is useless if your content contains characters which can not be encoded in 1250, a better way would be to export as "Unicode .txt" in Excel and use iconv to convert from Utf16. Maybe also do a
sed
ortr
to translate from '\t' to ',' -
JinSnow over 9 years@nevets1219 I did import the CSV into Google Spreadsheet but it didn't affect my character issue. Forgetting excel/google and using open office calc did solve my UTF8 issue (inserting my CSV into Excel 2013 as a "text" also failed by the way)
-
claymation over 9 yearsExcel's default encoding seems to be CP858 when saving as CSV or MS-DOS CSV, and Windows 1252 when saving as Windows CSV (as tested on Excel for Mac 2011).
-
lrepolho over 9 yearsThat's the best workaround for me! Now I can import import files via phpMyAdmin easily!
-
Seb over 9 yearsUhhh. I just had a shiver down my back. What if your excel file has 200,000 lines? Or contains sensitive data you do not want to include in an excel spreadsheat? Use Openoffice/Libreoffice if you need to.
-
nevets1219 over 9 years@Seb use the solution that fits the problem. I'm pretty sure Google spreadsheet can handle large files and they are private if you want them to be. Still a local application will do better in both cases if it's available.
-
UKDataGeek over 9 yearsI just tried it - and fyi you need to replace the whole file, not just copy the contents as Excel seems to break the text format of the file not the contents. Not sure how they broke it but Google Spreadsheets worked perfectly and as expected.
-
Rupert Rawnsley over 9 yearsI can confirm LibreOffice also works: it offers character encoding options on export that are sadly lacking in Excel.
-
Sruit A.Suk over 9 years^ it's an application in Mac OSX
-
Oskar Limka over 8 yearsThank you. This makes sense. Why MS still refuses to use UTF as a standard beats me.
-
EM0 over 8 yearsThis didn't quite work for me (using Excel 2007). I had 2 non-ASCII characters in my file and one of them was saved OK this way, the other wasn't.
-
jogojapan over 8 yearsThis doesn't work for me. The way I understand it it only takes effect when saving as a web format (HTML etc.), not when saving as CSV.
-
jogojapan over 8 yearsIt is an answer in the sense that it results in a tab-delimited (rather than comma-delimited) file, which is something most people (including me) can work with.
-
Elia Weiss over 8 yearsWorks for me - office 2007
-
Thomas Andrews over 8 yearsObviously (2) has a limitation - if any column has a comma (,) in it, then you have to quote that column. (Or does the international option quote all columns?)
-
Matthew Rodatus over 8 yearsThis doesn't work because your instructions don't include the necessary step of converting tabs to commas in the file data.
-
Vadzim about 8 yearsHere is direct link to Jaimon's UnicodeCSV Addin: jaimonmathew.wordpress.com/2011/08/23/…
-
Ivo Pereira about 8 yearsGoogle Spreadsheet does not let you choose the delimiter when exporting, so you need to stick with comma if you export a CSV.
-
nevets1219 about 8 yearsCSV gives you comma delimited and text gives you tab delimited. But I do believe you are correct about delimiters.
-
Flimm about 8 yearsI completely agree that this is useless, as when Excel saves in .csv format, it loses information when it comes to Unicode code points that can't be encoded in a one-byte-per-code-point encoding.
-
Flimm about 8 yearsExcel's "Unicode text" is UTF-16, not UTF-8 as asked for in the question.
-
Flimm about 8 yearsThe question is about converting an Excel file to a UTF-8 encoded CSV file. This answer starts with a UTF-8 encoded CSV file!
-
Flimm about 8 yearsI'm not sure I can find this "Encoding -> Convert to Ansi" option.
-
Flimm about 8 yearsNope, because UTF-16 is not the same as UTF-8, which is what the question is about.
-
Flimm about 8 yearsThe trouble with this solution is that it is lossy. If you have code points that don't fit in a one-byte-per-code-point encoding, then those characters will get lost in the transition.
-
Flimm about 8 years"Unicode text" in Excel means UTF-16, and not UTF-8, which is what the question asked for.
-
Flimm about 8 years@OskarLimka: When you say "UTF", do you mean UTF-8 or UTF-16? Because Microsoft does use UTF-16 quite a bit.
-
Flimm about 8 yearsNote that the resulting CSV file will be in UTF-16, not UTF-8 as the question asked.
-
Flimm about 8 yearsThere is no reason why UTF-8 would not work with Spanish characters.
-
Flimm about 8 yearsThe problem with this is how you generate the CSV file in the first place. If you simply save as CSV file in Excel, it will save it in cp1252, which is a one-byte-per-code-point encoding, and therefore loses information.
-
Flimm about 8 yearsThe trouble with this answer is how you generate the CSV file in the first place. If you simply save as CSV from Excel, the encoding will be cp1252, which is a one-byte-per-code-point encoding, and therefore will lose information for characters that don't fit into that. Also, at the end, you should convert to UTF-8, and not to Ansi, if you want to do what the question asked for.
-
Flimm about 8 yearsThis is incorrect, because of step 2, saving as CSV. The trouble with this is that Excel save the CSV file in cp1252, which is a one-byte-per-code-point encoding. This leads to loss of information for the characters which can't fit in one byte.
-
Flimm about 8 yearsThe trouble with this answer is how you generated the CSV file in the first place. If you did that by simpling saving as CSV in Excel, the CSV file will be in cp1252, which is a one-byte-per-code-point encoding. This will lead to loss of information when it comes to characters that cannot fit in one byte.
-
mpowered about 8 yearsOr choose a different output encoding, can't fix everyone's problems with a single answer :-)
-
Flimm about 8 yearsI know it's unfortunate, but the fact is that this answer will cause problems for some users. The fact that you've only ever run this with characters that fit in cp1252 is just lucky, you provided no warning about this risk in the answer, you just said that it "worked without a hitch". You've not been asked to fix everyone's problems, just OP's, which a lot of people share.
-
Flimm about 8 yearsThere you go, I fixed the issue in the answer post and removed the downvote.
-
RedYeti about 8 yearsWell, that's a problem with any of these answers. And will be with any answer to how to convert to UTF-8 since there's no way to know or control what the original encoding was.
-
Oskar Limka about 8 years@Flimm, I meant UTF-8, I'm not very familiar with UTF-16 as in my work I need to have full compatibility with previous ASCII and ISO-8859-1 files. I'm not sure UTF-16 is backward compatible in this respect.
-
Flimm about 8 yearsSome of the answers do talk about how to get round that issue, for instance, stackoverflow.com/a/15500052/247696
-
MLE over 7 yearsWhat if the file size is big? Goolge sheet have a size limit around 75M if I remember correctly. I have a file is 700M
-
felipeaf over 7 yearsIt worked for me, but i cannot understand why. My file generated by javascript is always read incorrectly in excel (as a different encoding). But when i open in notepad and save as utf-8, it works fine! So, looks like there is a metadata with the encoding. How it works? If notepad can save UTF-8 CSV files and Excel can read them, it's possible in my program generate correct UTF-8 CSV files, that Excel can read?
-
Henry Rusted over 7 yearsThis is exactly what worked for me. The question is, for most people, not whether the file is encoded in UTF-8 but whether the 'special' characters are preserved. The fact that Excel doesn't do this, in 2016, forces us to jump through hoops.
-
Julian about 7 yearsI have updated the implementation a bit and created a gist gist.github.com/julianthome/2d8546e7bed869079ab0f409ae0faa87
-
Rade_303 about 7 yearsSome characters get converted to "TAB" character on the file I tried this method
-
dof1985 almost 7 yearsFor a tab delimited file it may be better to use a
.txt
extension.csv
, i.e. comma separated files, is just confusing. -
Nathan over 6 yearsThis is a great option for those working with sensitive data. The OpenOffice suite can be a portable install for those with locked down machines.
-
RolfBly over 6 yearsWorked for me. In an XSLX converted from Access. Diacritics and LF-only and ISO-date formats are all OK. One or another did not work with more popular solutions.
-
Jason Williams over 6 yearsThe text of non-ascii Spanish characters will be changed from the original Spanish character to the encode string used by ANSI. So, the text will not be the same as the original.
-
Jason Williams over 6 yearsSimply a ANSI csv in Notepad++ and changing the encoding of the file to utf8 causes the original non-ascii Spanish characters to be converted into encode strings which do not match the original text.
-
MrE over 6 yearsalso be careful with Numbers as it has limitation in the number of rows, and I have converted data like this before not realizing it had cropped some of it. Excel/CSV has much higher limits.
-
JD-V about 6 yearsIt does't help.
-
absolute about 6 yearsThis method worked perfectly for a CSV file with over 15,000 records, with lines of more than the 1024 character limit imposed by Notepad. Takes seconds, and doesn't use third party software. Thanks!
-
Felk about 6 yearsNot sure about old versions of NP++, but in the current version you can just select Encoding > Convert to UTF-8. replaces steps 2-5
-
rovyko about 6 yearsThis is by far the easier answer IMO. I'm using Excel 2016 and found it save to ANSI encoding by default, but got it into UTF-8 which is what I wanted.
-
rovyko about 6 yearsYou should provide some instructions on how to use that option.
-
Admin about 6 years"The good news is that all the international characters are in UTF16 (note, not in UTF8).": complete nonsense. UTF-8 and UTF-16 are two ways to encode the whole set of Unicode code points.
-
Admin about 6 years@dexgecko sure: it's called the "Save" button/menu. There are now four CSV export types in Excel 2016 on Windows: CSV saves as "ANSI" (more or less equivalent Latin1, but that may be different with a non-western OS install, I'm not sure), UTF-8 CSV saves as UTF-8 with BOM, CSV (DOS) saves as CP850 (again, mais depend on the install?) and CSV (Mac) saves as MacRoman.
-
Admin about 6 yearsHowever, Excel will only be able to read UTF-8 and ANSI CSV files (Excel can use the BOM to choose). To import from another encoding, rename as .txt, open from Excel (you have then a long list of encodings to choose from), and since it won't interpret the separator correctly, use the "convert" button to split the rows. You can also use this trick when importing a CSV from a different language convention (in french for instance, the filed separator is a semicolon, as the comma is already used as decimal separator).
-
Admin about 6 yearsA bit OT, but anyway: it is simpler with VBA, as you can use the Origin argument of Workbook.Open when you open a CSV file. But this will only work with the few encodings available for CSV. One may use Workbook.OpenText, for which the Origin can be any encoding, alas that does not work for a CSV file encoded in UTF-8 without BOM! But if the file has the .txt extension, it works! Looks like a bug to me.
-
rovyko about 6 years@Jean-ClaudeArbaut Strange, I don't see a UTF-8 CSV option in my Excel 2016.
-
Admin about 6 years@dexgecko Strange, indeed. According to this page (in french, sorry), the functionality was added in november 2016, in build version 1610. I have currently the 1802 version. However, I thought only Office 365 had that kind of evolutions, and I'm using Office Pro 2016 (not the 365 flavor). Maybe try to update your Office.
-
Admin about 6 years@dexgecko A quick check on products.office.com seems to show the professional "one time" version is not sold any longer though. I don't know what happens to those like me who have this product. Thanks for having pointed this out!
-
James about 6 years@Kai Noack, if you are using Notepad++ anyway, you can just open the original non utf-8 csv and change the encoding from the Encoding option on the menu bar (might save you a few steps).
-
Daniel Maurer over 5 yearsI may get this wrong, but you can't save the file as a ".csv" in Notepad++ and thats what this is about.
-
Chloe over 5 yearsYes you can. You are only just saving a text file, and
.csv
is a text file. This answer will open the CSV file corrupted by Excel, fix it, then re-save it with the proper encoding. -
Ryan over 5 yearsAhh, even when I tried using Google Sheets, I ran into the same problem. So maybe this PowerShell strategy would have worked. Here was my problem. For certain characters such as certain emojis, you need to use
CHARACTER SET utf8mb4
as described here: stackoverflow.com/a/10959780/470749 -
Don Cruickshank over 5 years@Ryan That issue is specific to MySQL. I just tried the problematic character in that question and it worked fine in both Excel and PowerShell.
-
Don Cruickshank over 5 yearsThis option isn't available in my copy of Excel 2016. Are you using the Office 365 version?
-
Don Cruickshank about 5 yearsThis works because Excel provides a TSV version of the selection via the clipboard. You can equally well use Notepad instead of SublimeText but just remember to save with UTF-8 encoding if you do!
-
RolfBly over 4 yearsFWIW, This works on Excel 2016 version 1909 build 12026.20334. I'm pretty sure it did not work properly until a few weeks ago, cause I had to resort to Nick's solution above to keep diacritics intact. Possibly the fix was not in Excel itself, but in a Windows update.
-
Hart almost 4 yearsnot working on office 2010 choosing any CSV profile.
-
thymaro almost 4 yearsWhat you are communicating has far reaching consequences. Employing this eminent method, we can attain a higher data quality! Bravo! (Quiz: why is this comment so quizzical?)
-
user85361 over 3 yearsIt doesn't work in my case
-
ztvmark over 3 yearstechlandia.com/guardar-csv-utf8-como_163552 "Save as File Type" -> "CSV (Comma delimited)" -> "Tools" -> "Web Options" "Encoding" > "Save this document as" -> "Unicode (UTF-8)"
-
ClemM over 3 yearsOld answer but still valid 10years later. In my case I wanted to keep semicolon separator so I pasted CSV content on GoogleDocs (because there was not this choice of separator on Sheets), exported as .txt & renamed to .csv. Works.
-
Sjlver about 3 yearsWorked wonderfully! It's also possible to pipe the two commands together directly:
Import-Csv ... | Export-Csv ...
without using an intermediate$data
variable.