Excel to CSV with UTF8 encoding

843,014

Solution 1

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.

NOTE: Google Sheets does have limitations when importing. See here.

NOTE: Be careful of sensitive data with Google Sheets.

EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.

Solution 2

I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.

In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.

calc save dialog

Solution 3

  1. Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.

  2. (optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".

  3. Import your *.txt file in the target application. Make sure it can accept UTF16 format.

If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.

I use this procedure to import data from Excel to Moodle.

Solution 4

I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.

Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.

Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.

Please give it a try and see if it works for you. Good luck.

Solution 5

You can use iconv command under Unix (also available on Windows as libiconv).

After saving as CSV under Excel in the command line put:

iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv

(remember to replace cp1250 with your encoding).

Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).

Share:
843,014
Jeff Treuting
Author by

Jeff Treuting

Long time developer/consultant and co-creator of SharpRepository. Currently working at one of the best places for developers to call home, Fairway Technologies.

Updated on July 28, 2022

Comments

  • Jeff Treuting
    Jeff Treuting over 1 year

    I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.

    Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.

  • Jeff Treuting
    Jeff Treuting over 13 years
    I'm sure this works, I just don't happen to have OpenOffice already so Google Docs was easier in my situation. but thanks for the suggestion
  • charlax
    charlax about 12 years
    Thanks! It's a simple solution, far better that struggling with Excel.
  • ax.
    ax. over 11 years
    "Unicode Text" saves in UTF-16 LE (Little Endian), not in UTF-8 as asked by the OP.
  • Olivier 'Ölbaum' Scherler
    Olivier 'Ölbaum' Scherler over 11 years
    The question was specifically about UTF-8.
  • nevets1219
    nevets1219 over 11 years
    Added another possibly solution (untested) without having to use Google Spreadsheet.
  • Christian Davén
    Christian Davén over 11 years
    OpenOffice Calc has more and better options when opening and saving "CSV" files (field separators, encoding etc) than both Google Docs and Excel. Also, Google Docs currently suffers from a limit of 400,000 cells per spreadsheet, which OpenOffice Calc does not.
  • Zane
    Zane about 11 years
    No idea why you were downvoted. Notepad++ did it for me. Can't store my file in Google Spreadsheet as it is confidential.
  • mcNux
    mcNux almost 11 years
    Yes, but it is the best way to get Unicode support for your x-separated-values in Excel. I've had all sorts of issues trying to get Excel to play ball with UTF-8! Read More
  • Indolering
    Indolering almost 11 years
    Perhaps Joel Slotsky (former PM of Excel) could send them his decade-old post on The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets?
  • nikolas
    nikolas almost 11 years
    that worked for me.. had problem with greek characters when exportedmysql db as csv and imported it to excel..
  • user56reinstatemonica8
    user56reinstatemonica8 over 10 years
    For me, on Excel for Mac 2011, this works but only if I choose Windows comma separated (CSV). It doesn't work if I use the default or DOS CSV options - both these replace the accented characters with random junk characters. Tested for characters including é,è,â... Don't know if it's real UTF8 but the characters aren't mangled.
  • user56reinstatemonica8
    user56reinstatemonica8 over 10 years
    Quick confirmation - the files produced with this method on (Excel for Mac 2011) do not produce UTF-8 csvs, BUT, they do produce CSVs that at least contain the correct characters and can therefore can be painlessly converted to UTF8 in a text editor, which is a big step forward from the ludicrous mangled junk that Excel spits out by default.
  • cbmanica
    cbmanica over 10 years
    Yes, agreed, this worked for me as well (Excel Mac 2011) and it really deserves more upvotes.
  • Fer
    Fer over 10 years
    With the normal Windows Notepad (using save as, and then choosing utf-8 in the encoding option) worked for me. For me that's the best approach for me since that must be done by users that have no admin rights in their machines, so installing additional software is not required.
  • Robert Cutajar
    Robert Cutajar over 10 years
    OpenOffice JustWorks(tm) with no hassle - I think it should be included
  • nevets1219
    nevets1219 over 10 years
    I have no experience with OpenOffice but Yessus has submitted something in regards to OpenOffice - if it's not the same thing you should add it as answer
  • Mattias Lindberg
    Mattias Lindberg about 10 years
    Finally something that worked! Tried the above Excel options on Excel 2013 without success. I just switched to using \t as split character when I parsed it and it worked perfect!
  • jameshfisher
    jameshfisher about 10 years
    "Utf8 is a subset of Unicode": this makes no sense. UTF-8 is an encoding of Unicode.
  • Avatar
    Avatar almost 10 years
    Needed to export an XLS as CSV to import in MySQL. Using Excel 2003 I exported in format "Unicode Text (.txt)", then used Notepad++ to replace the TAB with ;, then imported the txt-file into phpmyadmin with default "Character set of the file: utf-8", Format "CSV using LOAD DATA". All encoding was transferred correctly.
  • elomage
    elomage almost 10 years
    For smaller files and simple data I use this shortcut: select the area or columns that I want to export, then copy (Ctrl+C). Then go to my text editor (for example, Gedit in Ubuntu) and paste. The text editor has all the data as TAB delimited. Then save as a text file, usually in UTF8. The final format may depend on your text editor configuration.
  • Sebastian
    Sebastian over 9 years
    That is useless if your content contains characters which can not be encoded in 1250, a better way would be to export as "Unicode .txt" in Excel and use iconv to convert from Utf16. Maybe also do a sed or tr to translate from '\t' to ','
  • JinSnow
    JinSnow over 9 years
    @nevets1219 I did import the CSV into Google Spreadsheet but it didn't affect my character issue. Forgetting excel/google and using open office calc did solve my UTF8 issue (inserting my CSV into Excel 2013 as a "text" also failed by the way)
  • claymation
    claymation over 9 years
    Excel's default encoding seems to be CP858 when saving as CSV or MS-DOS CSV, and Windows 1252 when saving as Windows CSV (as tested on Excel for Mac 2011).
  • lrepolho
    lrepolho over 9 years
    That's the best workaround for me! Now I can import import files via phpMyAdmin easily!
  • Seb
    Seb over 9 years
    Uhhh. I just had a shiver down my back. What if your excel file has 200,000 lines? Or contains sensitive data you do not want to include in an excel spreadsheat? Use Openoffice/Libreoffice if you need to.
  • nevets1219
    nevets1219 over 9 years
    @Seb use the solution that fits the problem. I'm pretty sure Google spreadsheet can handle large files and they are private if you want them to be. Still a local application will do better in both cases if it's available.
  • UKDataGeek
    UKDataGeek over 9 years
    I just tried it - and fyi you need to replace the whole file, not just copy the contents as Excel seems to break the text format of the file not the contents. Not sure how they broke it but Google Spreadsheets worked perfectly and as expected.
  • Rupert Rawnsley
    Rupert Rawnsley over 9 years
    I can confirm LibreOffice also works: it offers character encoding options on export that are sadly lacking in Excel.
  • Sruit A.Suk
    Sruit A.Suk over 9 years
    ^ it's an application in Mac OSX
  • Oskar Limka
    Oskar Limka over 8 years
    Thank you. This makes sense. Why MS still refuses to use UTF as a standard beats me.
  • EM0
    EM0 over 8 years
    This didn't quite work for me (using Excel 2007). I had 2 non-ASCII characters in my file and one of them was saved OK this way, the other wasn't.
  • jogojapan
    jogojapan over 8 years
    This doesn't work for me. The way I understand it it only takes effect when saving as a web format (HTML etc.), not when saving as CSV.
  • jogojapan
    jogojapan over 8 years
    It is an answer in the sense that it results in a tab-delimited (rather than comma-delimited) file, which is something most people (including me) can work with.
  • Elia Weiss
    Elia Weiss over 8 years
    Works for me - office 2007
  • Thomas Andrews
    Thomas Andrews over 8 years
    Obviously (2) has a limitation - if any column has a comma (,) in it, then you have to quote that column. (Or does the international option quote all columns?)
  • Matthew Rodatus
    Matthew Rodatus over 8 years
    This doesn't work because your instructions don't include the necessary step of converting tabs to commas in the file data.
  • Vadzim
    Vadzim about 8 years
    Here is direct link to Jaimon's UnicodeCSV Addin: jaimonmathew.wordpress.com/2011/08/23/…
  • Ivo Pereira
    Ivo Pereira about 8 years
    Google Spreadsheet does not let you choose the delimiter when exporting, so you need to stick with comma if you export a CSV.
  • nevets1219
    nevets1219 about 8 years
    CSV gives you comma delimited and text gives you tab delimited. But I do believe you are correct about delimiters.
  • Flimm
    Flimm about 8 years
    I completely agree that this is useless, as when Excel saves in .csv format, it loses information when it comes to Unicode code points that can't be encoded in a one-byte-per-code-point encoding.
  • Flimm
    Flimm about 8 years
    Excel's "Unicode text" is UTF-16, not UTF-8 as asked for in the question.
  • Flimm
    Flimm about 8 years
    The question is about converting an Excel file to a UTF-8 encoded CSV file. This answer starts with a UTF-8 encoded CSV file!
  • Flimm
    Flimm about 8 years
    I'm not sure I can find this "Encoding -> Convert to Ansi" option.
  • Flimm
    Flimm about 8 years
    Nope, because UTF-16 is not the same as UTF-8, which is what the question is about.
  • Flimm
    Flimm about 8 years
    The trouble with this solution is that it is lossy. If you have code points that don't fit in a one-byte-per-code-point encoding, then those characters will get lost in the transition.
  • Flimm
    Flimm about 8 years
    "Unicode text" in Excel means UTF-16, and not UTF-8, which is what the question asked for.
  • Flimm
    Flimm about 8 years
    @OskarLimka: When you say "UTF", do you mean UTF-8 or UTF-16? Because Microsoft does use UTF-16 quite a bit.
  • Flimm
    Flimm about 8 years
    Note that the resulting CSV file will be in UTF-16, not UTF-8 as the question asked.
  • Flimm
    Flimm about 8 years
    There is no reason why UTF-8 would not work with Spanish characters.
  • Flimm
    Flimm about 8 years
    The problem with this is how you generate the CSV file in the first place. If you simply save as CSV file in Excel, it will save it in cp1252, which is a one-byte-per-code-point encoding, and therefore loses information.
  • Flimm
    Flimm about 8 years
    The trouble with this answer is how you generate the CSV file in the first place. If you simply save as CSV from Excel, the encoding will be cp1252, which is a one-byte-per-code-point encoding, and therefore will lose information for characters that don't fit into that. Also, at the end, you should convert to UTF-8, and not to Ansi, if you want to do what the question asked for.
  • Flimm
    Flimm about 8 years
    This is incorrect, because of step 2, saving as CSV. The trouble with this is that Excel save the CSV file in cp1252, which is a one-byte-per-code-point encoding. This leads to loss of information for the characters which can't fit in one byte.
  • Flimm
    Flimm about 8 years
    The trouble with this answer is how you generated the CSV file in the first place. If you did that by simpling saving as CSV in Excel, the CSV file will be in cp1252, which is a one-byte-per-code-point encoding. This will lead to loss of information when it comes to characters that cannot fit in one byte.
  • mpowered
    mpowered about 8 years
    Or choose a different output encoding, can't fix everyone's problems with a single answer :-)
  • Flimm
    Flimm about 8 years
    I know it's unfortunate, but the fact is that this answer will cause problems for some users. The fact that you've only ever run this with characters that fit in cp1252 is just lucky, you provided no warning about this risk in the answer, you just said that it "worked without a hitch". You've not been asked to fix everyone's problems, just OP's, which a lot of people share.
  • Flimm
    Flimm about 8 years
    There you go, I fixed the issue in the answer post and removed the downvote.
  • RedYeti
    RedYeti about 8 years
    Well, that's a problem with any of these answers. And will be with any answer to how to convert to UTF-8 since there's no way to know or control what the original encoding was.
  • Oskar Limka
    Oskar Limka about 8 years
    @Flimm, I meant UTF-8, I'm not very familiar with UTF-16 as in my work I need to have full compatibility with previous ASCII and ISO-8859-1 files. I'm not sure UTF-16 is backward compatible in this respect.
  • Flimm
    Flimm about 8 years
    Some of the answers do talk about how to get round that issue, for instance, stackoverflow.com/a/15500052/247696
  • MLE
    MLE over 7 years
    What if the file size is big? Goolge sheet have a size limit around 75M if I remember correctly. I have a file is 700M
  • felipeaf
    felipeaf over 7 years
    It worked for me, but i cannot understand why. My file generated by javascript is always read incorrectly in excel (as a different encoding). But when i open in notepad and save as utf-8, it works fine! So, looks like there is a metadata with the encoding. How it works? If notepad can save UTF-8 CSV files and Excel can read them, it's possible in my program generate correct UTF-8 CSV files, that Excel can read?
  • Henry Rusted
    Henry Rusted over 7 years
    This is exactly what worked for me. The question is, for most people, not whether the file is encoded in UTF-8 but whether the 'special' characters are preserved. The fact that Excel doesn't do this, in 2016, forces us to jump through hoops.
  • Julian
    Julian about 7 years
    I have updated the implementation a bit and created a gist gist.github.com/julianthome/2d8546e7bed869079ab0f409ae0faa87
  • Rade_303
    Rade_303 about 7 years
    Some characters get converted to "TAB" character on the file I tried this method
  • dof1985
    dof1985 almost 7 years
    For a tab delimited file it may be better to use a .txt extension. csv, i.e. comma separated files, is just confusing.
  • Nathan
    Nathan over 6 years
    This is a great option for those working with sensitive data. The OpenOffice suite can be a portable install for those with locked down machines.
  • RolfBly
    RolfBly over 6 years
    Worked for me. In an XSLX converted from Access. Diacritics and LF-only and ISO-date formats are all OK. One or another did not work with more popular solutions.
  • Jason Williams
    Jason Williams over 6 years
    The text of non-ascii Spanish characters will be changed from the original Spanish character to the encode string used by ANSI. So, the text will not be the same as the original.
  • Jason Williams
    Jason Williams over 6 years
    Simply a ANSI csv in Notepad++ and changing the encoding of the file to utf8 causes the original non-ascii Spanish characters to be converted into encode strings which do not match the original text.
  • MrE
    MrE over 6 years
    also be careful with Numbers as it has limitation in the number of rows, and I have converted data like this before not realizing it had cropped some of it. Excel/CSV has much higher limits.
  • JD-V
    JD-V about 6 years
    It does't help.
  • absolute
    absolute about 6 years
    This method worked perfectly for a CSV file with over 15,000 records, with lines of more than the 1024 character limit imposed by Notepad. Takes seconds, and doesn't use third party software. Thanks!
  • Felk
    Felk about 6 years
    Not sure about old versions of NP++, but in the current version you can just select Encoding > Convert to UTF-8. replaces steps 2-5
  • rovyko
    rovyko about 6 years
    This is by far the easier answer IMO. I'm using Excel 2016 and found it save to ANSI encoding by default, but got it into UTF-8 which is what I wanted.
  • rovyko
    rovyko about 6 years
    You should provide some instructions on how to use that option.
  • Admin
    Admin about 6 years
    "The good news is that all the international characters are in UTF16 (note, not in UTF8).": complete nonsense. UTF-8 and UTF-16 are two ways to encode the whole set of Unicode code points.
  • Admin
    Admin about 6 years
    @dexgecko sure: it's called the "Save" button/menu. There are now four CSV export types in Excel 2016 on Windows: CSV saves as "ANSI" (more or less equivalent Latin1, but that may be different with a non-western OS install, I'm not sure), UTF-8 CSV saves as UTF-8 with BOM, CSV (DOS) saves as CP850 (again, mais depend on the install?) and CSV (Mac) saves as MacRoman.
  • Admin
    Admin about 6 years
    However, Excel will only be able to read UTF-8 and ANSI CSV files (Excel can use the BOM to choose). To import from another encoding, rename as .txt, open from Excel (you have then a long list of encodings to choose from), and since it won't interpret the separator correctly, use the "convert" button to split the rows. You can also use this trick when importing a CSV from a different language convention (in french for instance, the filed separator is a semicolon, as the comma is already used as decimal separator).
  • Admin
    Admin about 6 years
    A bit OT, but anyway: it is simpler with VBA, as you can use the Origin argument of Workbook.Open when you open a CSV file. But this will only work with the few encodings available for CSV. One may use Workbook.OpenText, for which the Origin can be any encoding, alas that does not work for a CSV file encoded in UTF-8 without BOM! But if the file has the .txt extension, it works! Looks like a bug to me.
  • rovyko
    rovyko about 6 years
    @Jean-ClaudeArbaut Strange, I don't see a UTF-8 CSV option in my Excel 2016.
  • Admin
    Admin about 6 years
    @dexgecko Strange, indeed. According to this page (in french, sorry), the functionality was added in november 2016, in build version 1610. I have currently the 1802 version. However, I thought only Office 365 had that kind of evolutions, and I'm using Office Pro 2016 (not the 365 flavor). Maybe try to update your Office.
  • Admin
    Admin about 6 years
    @dexgecko A quick check on products.office.com seems to show the professional "one time" version is not sold any longer though. I don't know what happens to those like me who have this product. Thanks for having pointed this out!
  • James
    James about 6 years
    @Kai Noack, if you are using Notepad++ anyway, you can just open the original non utf-8 csv and change the encoding from the Encoding option on the menu bar (might save you a few steps).
  • Daniel Maurer
    Daniel Maurer over 5 years
    I may get this wrong, but you can't save the file as a ".csv" in Notepad++ and thats what this is about.
  • Chloe
    Chloe over 5 years
    Yes you can. You are only just saving a text file, and .csv is a text file. This answer will open the CSV file corrupted by Excel, fix it, then re-save it with the proper encoding.
  • Ryan
    Ryan over 5 years
    Ahh, even when I tried using Google Sheets, I ran into the same problem. So maybe this PowerShell strategy would have worked. Here was my problem. For certain characters such as certain emojis, you need to use CHARACTER SET utf8mb4 as described here: stackoverflow.com/a/10959780/470749
  • Don Cruickshank
    Don Cruickshank over 5 years
    @Ryan That issue is specific to MySQL. I just tried the problematic character in that question and it worked fine in both Excel and PowerShell.
  • Don Cruickshank
    Don Cruickshank over 5 years
    This option isn't available in my copy of Excel 2016. Are you using the Office 365 version?
  • Don Cruickshank
    Don Cruickshank about 5 years
    This works because Excel provides a TSV version of the selection via the clipboard. You can equally well use Notepad instead of SublimeText but just remember to save with UTF-8 encoding if you do!
  • RolfBly
    RolfBly over 4 years
    FWIW, This works on Excel 2016 version 1909 build 12026.20334. I'm pretty sure it did not work properly until a few weeks ago, cause I had to resort to Nick's solution above to keep diacritics intact. Possibly the fix was not in Excel itself, but in a Windows update.
  • Hart
    Hart almost 4 years
    not working on office 2010 choosing any CSV profile.
  • thymaro
    thymaro almost 4 years
    What you are communicating has far reaching consequences. Employing this eminent method, we can attain a higher data quality! Bravo! (Quiz: why is this comment so quizzical?)
  • user85361
    user85361 over 3 years
    It doesn't work in my case
  • ztvmark
    ztvmark over 3 years
    techlandia.com/guardar-csv-utf8-como_163552 "Save as File Type" -> "CSV (Comma delimited)" -> "Tools" -> "Web Options" "Encoding" > "Save this document as" -> "Unicode (UTF-8)"
  • ClemM
    ClemM over 3 years
    Old answer but still valid 10years later. In my case I wanted to keep semicolon separator so I pasted CSV content on GoogleDocs (because there was not this choice of separator on Sheets), exported as .txt & renamed to .csv. Works.
  • Sjlver
    Sjlver about 3 years
    Worked wonderfully! It's also possible to pipe the two commands together directly: Import-Csv ... | Export-Csv ... without using an intermediate $data variable.