ANSI vs SHIFT JIS vs UTF-8 in c#

16,546

As far as code pages are concerned, "ANSI" (and Encoding.Default in .NET) basically just means "the non-Unicode codepage used by this system" - exactly what codepage that is, depends on how the system is configured, but on a Western European system, it's likely to be Windows-1252.

For the system where that text comes from, then "ANSI" would appear to mean Shift-JIS - so unless your system has the same code page, you'll need to tell your code to read the text as Shift-JIS.

Assuming you're reading the file with a StreamReader, there are various constructors that take an Encoding, so just grab a Shift-JIS encoding with Encoding.GetEncoding("shift_jis") or Encoding.GetEncoding(932) and use it to construct your StreamReader.

Share:
16,546
remo
Author by

remo

Updated on June 23, 2022

Comments

  • remo
    remo almost 2 years

    I have been trying to figure the difference for quite sometime now. The issue is with a file that is in ANSI encoding has japanese characters like: ­‚È‚­‚Æ‚à1‚‚ÌINCREMENTs‚ª•K—v‚Å‚·. It equivalent in shift-jis is 少なくとも1つのINCREMENT行が必要です. which is expected to be in japanese.

    I need to display these characters after reading from file(in ANSI) on a webpage. There are some other files in UTF-8 displaying characters right not seeing this. I am finding it difficult to figure out whats the difference and how do I change encoding to do right things here.. I use c# for reading this file and displaying it, I also need to write the string back into file if its modified on web. Any encoding and decoding schemas here?

  • remo
    remo about 12 years
    Does this mean, if we are storing this in database and displaying on web..does it display right characters in japanese? Does the web take care of it ?
  • Michael Madsen
    Michael Madsen about 12 years
    @remo: Yes. C# always works with Unicode (specifically UTF-16LE) internally, so once it knows it's reading Shift-JIS from your file, it can convert the string correctly, and it can be stored correctly in your database (as long as the database also uses Unicode). Similarly, your web page can read the data and output it correctly (typically using UTF-8)
  • Michael Madsen
    Michael Madsen about 12 years
    @remo: I don't understand your question. You don't need to convert Unicode to Shift-JIS unless you're processing the data with something that only expects Shift-JIS. All browsers use Unicode internally, they don't need to convert to Shift-JIS to display Japanese characters.
  • remo
    remo about 12 years
    thanks for you help.I got that after some more research on my end.
  • radu florescu
    radu florescu about 11 years
    can you please post the encoding for chinese simplified and or korean?
  • Michael Madsen
    Michael Madsen about 11 years
    @Floradu88: If you click the "Encoding" link in my post, you'll find a list of all of the encodings known to .NET. IIRC, you'll want code page 949 for Korean and code page 936 for simplified Chinese.
  • user145610
    user145610 over 7 years
    Is n't UTF-8 or UTF-16 encoding will be able to read Chinese or Japanese characters
  • Michael Madsen
    Michael Madsen over 7 years
    @user145610 When writing data, an encoding is a function which takes a character and converts it to a specific sequence of byte values. When reading data, it works in reverse: it takes a sequence of bytes and converts them to characters. Consequently, you need to read with the same encoding as was used to write the data in the first place (in this particular case, that was Shift-JIS). While both UTF-8 and UTF-16 can store any piece of text, you can't store a text with UTF-8 and then read it back using UTF-16 - the bytes will be converted to different characters than you started with.