ANSI vs SHIFT JIS vs UTF-8 in c#
As far as code pages are concerned, "ANSI" (and Encoding.Default
in .NET) basically just means "the non-Unicode codepage used by this system" - exactly what codepage that is, depends on how the system is configured, but on a Western European system, it's likely to be Windows-1252.
For the system where that text comes from, then "ANSI" would appear to mean Shift-JIS - so unless your system has the same code page, you'll need to tell your code to read the text as Shift-JIS.
Assuming you're reading the file with a StreamReader, there are various constructors that take an Encoding, so just grab a Shift-JIS encoding with Encoding.GetEncoding("shift_jis")
or Encoding.GetEncoding(932)
and use it to construct your StreamReader.
remo
Updated on June 23, 2022Comments
-
remo almost 2 years
I have been trying to figure the difference for quite sometime now. The issue is with a file that is in ANSI encoding has japanese characters like:
‚È‚‚Æ‚à1‚‚ÌINCREMENTs‚ª•K—v‚Å‚·.
It equivalent in shift-jis is少なくとも1つのINCREMENT行が必要です.
which is expected to be in japanese.I need to display these characters after reading from file(in ANSI) on a webpage. There are some other files in UTF-8 displaying characters right not seeing this. I am finding it difficult to figure out whats the difference and how do I change encoding to do right things here.. I use c# for reading this file and displaying it, I also need to write the string back into file if its modified on web. Any encoding and decoding schemas here?
-
remo about 12 yearsDoes this mean, if we are storing this in database and displaying on web..does it display right characters in japanese? Does the web take care of it ?
-
Michael Madsen about 12 years@remo: Yes. C# always works with Unicode (specifically UTF-16LE) internally, so once it knows it's reading Shift-JIS from your file, it can convert the string correctly, and it can be stored correctly in your database (as long as the database also uses Unicode). Similarly, your web page can read the data and output it correctly (typically using UTF-8)
-
Michael Madsen about 12 years@remo: I don't understand your question. You don't need to convert Unicode to Shift-JIS unless you're processing the data with something that only expects Shift-JIS. All browsers use Unicode internally, they don't need to convert to Shift-JIS to display Japanese characters.
-
remo about 12 yearsthanks for you help.I got that after some more research on my end.
-
radu florescu about 11 yearscan you please post the encoding for chinese simplified and or korean?
-
Michael Madsen about 11 years@Floradu88: If you click the "Encoding" link in my post, you'll find a list of all of the encodings known to .NET. IIRC, you'll want code page 949 for Korean and code page 936 for simplified Chinese.
-
user145610 over 7 yearsIs n't UTF-8 or UTF-16 encoding will be able to read Chinese or Japanese characters
-
Michael Madsen over 7 years@user145610 When writing data, an encoding is a function which takes a character and converts it to a specific sequence of byte values. When reading data, it works in reverse: it takes a sequence of bytes and converts them to characters. Consequently, you need to read with the same encoding as was used to write the data in the first place (in this particular case, that was Shift-JIS). While both UTF-8 and UTF-16 can store any piece of text, you can't store a text with UTF-8 and then read it back using UTF-16 - the bytes will be converted to different characters than you started with.