Which encoding opens CSV files correctly with Excel on both Mac and Windows?

158,091

Solution 1

The lowdown is: There is no solution. Excel 2011/Mac cannot correctly interpret a CSV file containing umlauts and diacritical marks no matter what encoding or hoop jumping you do. I'd be glad to hear someone tell me different!

Solution 2

Excel Encodings

I found the WINDOWS-1252 encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding "File origin" or "File encoding" selector which correctly reads the data.

Depending on your system and the tools you use, this encoding could also be named CP1252, ANSI, Windows (ANSI), MS-ANSI or just Windows, among other variations.

This encoding is a superset of ISO-8859-1 (aka LATIN1 and others), so you can fallback to ISO-8859-1 if you cannot use WINDOWS-1252 for some reason. Be advised that ISO-8859-1 is missing some characters from WINDOWS-1252 as shown here:

| Char | ANSI | Unicode | ANSI Hex | Unicode Hex | HTML entity | Unicode Name                               | Unicode Range            |
| €    | 128  | 8364    | 0x80     | U+20AC      | €      | euro sign                                  | Currency Symbols         |
| ‚    | 130  | 8218    | 0x82     | U+201A      | ‚     | single low-9 quotation mark                | General Punctuation      |
| ƒ    | 131  | 402     | 0x83     | U+0192      | ƒ      | Latin small letter f with hook             | Latin Extended-B         |
| „    | 132  | 8222    | 0x84     | U+201E      | „     | double low-9 quotation mark                | General Punctuation      |
| …    | 133  | 8230    | 0x85     | U+2026      | …    | horizontal ellipsis                        | General Punctuation      |
| †    | 134  | 8224    | 0x86     | U+2020      | †    | dagger                                     | General Punctuation      |
| ‡    | 135  | 8225    | 0x87     | U+2021      | ‡    | double dagger                              | General Punctuation      |
| ˆ    | 136  | 710     | 0x88     | U+02C6      | ˆ      | modifier letter circumflex accent          | Spacing Modifier Letters |
| ‰    | 137  | 8240    | 0x89     | U+2030      | ‰    | per mille sign                             | General Punctuation      |
| Š    | 138  | 352     | 0x8A     | U+0160      | Š    | Latin capital letter S with caron          | Latin Extended-A         |
| ‹    | 139  | 8249    | 0x8B     | U+2039      | ‹    | single left-pointing angle quotation mark  | General Punctuation      |
| Π   | 140  | 338     | 0x8C     | U+0152      | Π    | Latin capital ligature OE                  | Latin Extended-A         |
| Ž    | 142  | 381     | 0x8E     | U+017D      |             | Latin capital letter Z with caron          | Latin Extended-A         |
| ‘    | 145  | 8216    | 0x91     | U+2018      | ‘     | left single quotation mark                 | General Punctuation      |
| ’    | 146  | 8217    | 0x92     | U+2019      | ’     | right single quotation mark                | General Punctuation      |
| “    | 147  | 8220    | 0x93     | U+201C      | “     | left double quotation mark                 | General Punctuation      |
| ”    | 148  | 8221    | 0x94     | U+201D      | ”     | right double quotation mark                | General Punctuation      |
| •    | 149  | 8226    | 0x95     | U+2022      | •      | bullet                                     | General Punctuation      |
| –    | 150  | 8211    | 0x96     | U+2013      | –     | en dash                                    | General Punctuation      |
| —    | 151  | 8212    | 0x97     | U+2014      | —     | em dash                                    | General Punctuation      |
| ˜    | 152  | 732     | 0x98     | U+02DC      | ˜     | small tilde                                | Spacing Modifier Letters |
| ™    | 153  | 8482    | 0x99     | U+2122      | ™     | trade mark sign                            | Letterlike Symbols       |
| š    | 154  | 353     | 0x9A     | U+0161      | š    | Latin small letter s with caron            | Latin Extended-A         |
| ›    | 155  | 8250    | 0x9B     | U+203A      | ›    | single right-pointing angle quotation mark | General Punctuation      |
| œ    | 156  | 339     | 0x9C     | U+0153      | œ     | Latin small ligature oe                    | Latin Extended-A         |
| ž    | 158  | 382     | 0x9E     | U+017E      |             | Latin small letter z with caron            | Latin Extended-A         |
| Ÿ    | 159  | 376     | 0x9F     | U+0178      | Ÿ      | Latin capital letter Y with diaeresis      | Latin Extended-A         |

Note that the euro sign is missing. This table can be found at Alan Wood.

Conversion

Conversion is done differently in every tool and language. However, suppose you have a file query_result.csv which you know is UTF-8 encoded. Convert it to WINDOWS-1252 using iconv:

iconv -f UTF-8 -t WINDOWS-1252 query_result.csv > query_result-win.csv

Solution 3

For UTF-16LE with BOM if you use tab characters as your delimiters instead of commas Excel will recognise the fields. The reason it works is that Excel actually ends up using its Unicode *.txt parser.

Caveat: If the file is edited in Excel and saved, it will be saved as tab-delimited ASCII. The problem now is that when you re-open the file Excel assumes it's real CSV (with commas), sees that it's not Unicode, so parses it as comma-delimited - and hence will make a hash of it!

Update: The above caveat doesn't appear to be happening for me today in Excel 2010 (Windows) at least, although there does appear to be a difference in saving behaviour if:

  • you edit and quit Excel (tries to save as 'Unicode *.txt')

compared to:

  • editing and closing just the file (works as expected).

Solution 4

You only have tried comma-separated and semicolon-separated CSV. If you had tried tab-separated CSV (also called TSV) you would have found the answer:

UTF-16LE with BOM (byte order mark), tab-separated


But: In a comment you mention that TSV is not an option for you (I haven't been able to find this requirement in your question though). That's a pity. It often means that you allow manual editing of TSV files, which probably is not a good idea. Visual checking of TSV files is not a problem. Furthermore editors can be set to display a special character to mark tabs.

And yes, I tried this out on Windows and Mac.

Solution 5

The best workaround for reading CSV files with UTF-8 on Mac is to convert them into XLSX format. I have found a script made by Konrad Foerstner, which I have improved little bit by adding support for different delimiter characters.

Download the script from Github https://github.com/brablc/clit/blob/master/csv2xlsx.py. In order to run it you will need to install a python module openpyxl for Excel file manipulation: sudo easy_install openpyxl.

Share:
158,091
Timm
Author by

Timm

I do web design and development, both client (HTML, CSS, Javascript) and server-side (PHP, MySQL). Also some Unix scripting and bespoke applications for OS X and Windows. For web sites I use Typo3, Typolight, Wordpress (almost exclusively now); for web applications mostly Symfony. Tools I use: Adobe CS, Sublime Text, Sequel Pro, Transmit for FTP, Git Tower, also Text Wrangler and Xcode. Platform: Mac OS X 10.11 El Capitan There's no right or wrong. We don't know — we only guess.

Updated on January 24, 2020

Comments

  • Timm
    Timm over 4 years

    We have a web app that exports CSV files containing foreign characters with UTF-8, no BOM. Both Windows and Mac users get garbage characters in Excel. I tried converting to UTF-8 with BOM; Excel/Win is fine with it, Excel/Mac shows gibberish. I'm using Excel 2003/Win, Excel 2011/Mac. Here's all the encodings I tried:

    Encoding  BOM      Win                            Mac
    --------  ---      ----------------------------   ------------
    utf-8     --       scrambled                      scrambled
    utf-8     BOM      WORKS                          scrambled
    utf-16    --       file not recognized            file not recognized
    utf-16    BOM      file not recognized            Chinese gibberish
    utf-16LE  --       file not recognized            file not recognized
    utf-16LE  BOM      characters OK,                 same as Win
                       row data all in first field
    

    The best one is UTF-16LE with BOM, but the CSV is not recognized as such. The field separator is comma, but semicolon doesn't change things.

    Is there any encoding that works in both worlds?

  • Timm
    Timm over 12 years
    Thanks @royce23, but I'm just offering the CSV file for download. I can't present it through HTTP because the sheer size of the markup would slow the response to a crawl - the exported table may contain millions of rows...
  • royce3
    royce3 over 12 years
    with css your html would only be a tiny fraction larger than csv, for example: <r><c>id</c><c>name</c><c>phone</c></r>
  • Timm
    Timm over 12 years
    Not sure if I understand, but I'm saving the CSV on the server and offering a download link. Generating an HTML response gobbles up too much PHP memory...
  • Timm
    Timm over 12 years
    Nice, but the caveat breaks the solution for me; the end users will not be happy with broken Excel sheets.
  • Duncan Smart
    Duncan Smart over 12 years
    Possibly if you change the initial file extension to *.txt it would work, but then you lose association between the filetype and Excel: i.e. they can't double-click the file and have it open in Excel automatically.
  • Timm
    Timm over 12 years
    That's won't work for me. Not being computer-savvy, the end user needs to open it in Excel without any hurdles.
  • Timm
    Timm over 12 years
    What language are you using @user525081 ? Can you translate it to PHP?
  • Ashish Datta
    Ashish Datta over 11 years
    @Timm that looks like a Java sample but in PHP you can use iconv to do the conversion - de3.php.net/manual/en/function.iconv.php
  • Timm
    Timm over 11 years
    OK @user525081, same deal as the other answers. This caters to Mac users, leaving Windows people in the lurch; and it doesn't answer the original question - an encoding that works on both platforms. Thanks.
  • Geek Stocks
    Geek Stocks over 10 years
    I have my ".csv" Excel sheets looking good with special characters and separated fields. I start my output string with "\ufeff" as a byte order mark (BOM), then using "\t" tabs in place of commas for field separation, and encoding the file with "utf-16LE". Works like a charm, thanks to this page!
  • mikezter
    mikezter over 10 years
    I found the WIN-1252 or ISO-8859-1 encodings to be working. Please see my answer.
  • Tim Groeneveld
    Tim Groeneveld about 10 years
    The solution is to use UTF-16LE and ensure that your using tabs to separate columns instead of commas.
  • Timm
    Timm about 10 years
    Did you really try this on Win and Mac Tim? As I mentioned, TSV is not an option in my case.
  • Fergie
    Fergie almost 10 years
    A bit of a faff, but this does seem to be the answer for importing .csv files with european characters into Excel on Mac OSX
  • Pierre Arnaud
    Pierre Arnaud almost 10 years
    For me, export works fine if I use WIN-1252 encoding, both on Mac and on Windows versions of Excel. @Timm, you should consider changing the accepted answer.
  • mikezter
    mikezter almost 10 years
    True. It answers the OP's question instead. In your case you would first have to know (or guess) the encoding used in your ".csv file with european characters". Then you can convert it to WINDOS-1252, which will most probably be correctly interpreted by both Mac and Windows Excel.
  • Bill Leeper
    Bill Leeper over 9 years
    For those that found this working, did you actually have extended (like chinese) characters in your datasets? The WIN-1252 encoding breaks on those because they are out of range.
  • Walter Tross
    Walter Tross about 9 years
    This is no real solution, sooner or later you will come across a character which is not convertible to WINDOWS-1252.
  • Walter Tross
    Walter Tross about 9 years
    If you have a UTF-8 file without BOM, iconv will convert it to UTF-16LE without BOM (and unfortunately there is no way to tell iconv to add one)
  • XWang
    XWang almost 9 years
    WINDOWS-1252 will fail if there is Chinese character. So it seems that UTF-16LE with BOM is the only option.
  • QuestionC
    QuestionC over 8 years
    WIN-1252 doesn't work exporting "Curaçao" on Excel 2010.
  • cropredy
    cropredy almost 8 years
    This will work (the UTF-8 chars) but if you have embedded line breaks within cells (br tag), Excel for Mac ignores the (works with Windows) CSS mso-data-placement:same-cell;
  • motorbaby
    motorbaby over 7 years
    This works well for SQL data exports with diacritics.
  • CodeManX
    CodeManX over 7 years
    First line sep=, and UTF16LE encoding worked for me and did not require a different separator character (it remained to be comma). Opening the file by double-click loaded the file correctly, with special characters and line breaks within cells intact. Downside: the sep=, header is not recognized by any program except Excel as far as I've seen. But OpenOffice / LibreOffice do not require this hack anyway (line-breaks in cell contents work just fine, whereas loading from text file / using the text into columns assistant in Excel does not properly handle line breaks in cells).
  • Donald
    Donald about 5 years
    This is the only thing that worked for me on OS X 10.14.2 (and Excel 2011)
  • swarna
    swarna over 2 years
    @mikezter will it also work on Linux as well if not what is the common encoding for Windows & Linux for CSV files. Thank You.
  • mikezter
    mikezter over 2 years
    @swarna Yes, you can use iconv on Linux, too. Some tools on Linux might assume UTF-8 encoding on files of unknown origin. You can then either use iconv to convert them to UTF-8, or find a way to tell the tool in question which encoding your file uses.