c#, Excel + csv: how to get the correct encoding?

34,497

Solution 1

This solution is written up as a fix for a Java application however you should be able to do something similar in C#. You may also want to look at the documentation on the StreamWriter class, in the remarks it refers to the Byte Order Mark (BOM).

Solution 2

This worked perfect for me:

private const int WIN_1252_CP = 1252; // Windows ANSI codepage 1252

    this._writer = new StreamWriter(fileName, false, Encoding.GetEncoding(WIN_1252_CP));

CSV encoding issues (Microsoft Excel)

Solution 3

try the following:

using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
  var preamble = Encoding.UTF8.GetPreamble();
  sw.Write(preamble, 0, preamble.Length);
  var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");
  sw.Write(data, 0, data.Length);
}

It writes the proper UTF8 preamble to the file before writing the UTF8 encoded CSV.

Solution 4

Trevor Germain's helped me to save in the correct encoded format

using (var sw = File.Create(Path.Combine(txtPath.Text, "UTF8.csv")))
{
    var preamble = Encoding.UTF8.GetPreamble();  
    sw.Write(preamble, 0, preamble.Length);  
    var data = Encoding.UTF8.GetBytes("懘荧,\"Hello\",text");  
    sw.Write(data, 0, data.Length);
}

Solution 5

"ANSI as UTF8"(WTF?)

NotePad++ is probably correct. The encoding is UTF8 (i.e., correct Unicode header), but only contains ANSI data (i.e., é is not encoded in correct UTF8 way, which would mean two bytes).

Or: it is the other way around. It is ANSI (no file header BOM), but the encoding of the individual characters is, or looks like, UTF8. This would explain the ü and other characters expanding in more than one other character. You can fix this by forcing the file to be read as Unicode.

If it's possible to post (part of) your CSV, we may be able to help fixing it at the source.

Edit

Now that we've seen your code: can you remove the StreamWriter and replace it with a TextWriter? Also, remove the hand-encoding of the BOM, it is not necessary. When you create a TextWriter, you can specify the encoding (don't use ASCII, try UTF8).

Share:
34,497
yas4891
Author by

yas4891

I'm working for one of Germany's biggest employers as a team leader. I have a Master's degree in Electrical Engineering and Applied Computer Technology. I started programming when I was 14 years old in a programming language called GWBasic and since have finished projects in TurboPascal, Java, PHP, C/C++,C# and Ruby on Rails. I strive to learn more about programming, software engineering and project management.

Updated on January 11, 2020

Comments

  • yas4891
    yas4891 over 4 years

    I've been trying this for quite a while now, but can't figure it out. I'm trying to export data to Excel via a *.csv file. It works great so far, but I have some encoding problems when opening the files in Excel.

    (original string on the left, EXCEL result on the right):

    Messwert(µm / m) ==> Messwert(µm / m)
    
    Dümme Mässöng ==> Dümme Mässöng
    

    Notepad++ tells me that the file is encoded "ANSI as UTF8"(WTF?)

    So here are different ways I tried to get a valid result: obvious implementation:

    tWriter.Write(";Messwert(µm /m)");
    

    more sophisticated one (tried probably a dozen or more encoding combinations:)

    tWriter.Write(Encoding.Default.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
    tWriter.Write(Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
    

    and so on

    Whole source code for the method creating the data:

        MemoryStream tStream = new MemoryStream();
        StreamWriter tWriter = new StreamWriter(tStream);
        tWriter.Write("\uFEFF");
    
        tWriter.WriteLine(string.Format("{0}", aMeasurement.Name));
        tWriter.WriteLine(aMeasurement.Comment);
        tWriter.WriteLine();
        tWriter.WriteLine("Zeit in Minuten;Messwert(µm / m)");
    
        TimeSpan tSpan;
        foreach (IMeasuringPoint tPoint in aMeasurement)
        {
            tSpan = new TimeSpan(tPoint.Time - aMeasurement[0].Time);
            tWriter.WriteLine(string.Format("{0};{1};", (int)tSpan.TotalMinutes, getMPString(tPoint)));
        }
    
        tWriter.Flush();
        return tStream;
    

    Generated CSV file:

    Dümme Mössäng
    Testmessung die erste
    
    Zeit in Minuten;Messwert(µm / m)
    0;-703;
    0;-381;
    1;1039;
    1;1045;
    2;1457;
    2;1045;