C#: bytes to UTF-8 string conversion. Why doesn't it work?
Solution 1
You need to write to a file using UTF8. The code below shows how you may do it. When opening the resulting file in Notepad, the character 𤭢 is shown correctly:
string c = "𤭢";
var bytes = Encoding.UTF8.GetBytes(c);
var cBack = Encoding.UTF8.GetString(bytes);
using (var writer = new StreamWriter(@"c:\temp\char.txt", false, Encoding.UTF8))
{
writer.WriteLine(cBack);
}
Solution 2
Console can't display Unicode characters by default. It displays only ASCII. To enable it display Unicode, use:
Console.OutputEncoding = System.Text.Encoding.Unicode
before writing to it.
But anyway it will fail on most OS, because Windows Command line doesn't support Unicode itself.
So, for testing purpose it would be better to write output to file
Racoon
Updated on June 08, 2022Comments
-
Racoon almost 2 years
There is a Chinese character 𤭢 which is presented in UTF-8 as F0 A4 AD A2. This character is described here: http://en.wikipedia.org/wiki/UTF-8
𤭢 U+24B62 F0 A4 AD A2
When I run this code in C# ...
byte[] data = { 0xF0, 0xA4, 0xAD, 0xA2 }; string abc = Encoding.UTF8.GetString(data); Console.WriteLine("Test: description = {0}", abc);
... I redirect the output to the text file and then open it with notepad.exe choosing UTF-8 encoding. I expect to get 𤭢 in the output, but do get two question marks (??).
The byte sequence is right. It works in Perl:
print "\xF0\xA4\xAD\xA2";
In the output, I get 𤭢
So my question is: why do I get "??" instead of "𤭢" in C#?
P.S. Nothing special with this character: I got the same thing for any character (2, 3 or 4 byte long).
-
Security Hound about 11 yearsIf it possible to set the Encoding on a console application it should be possible to set the Encoding when a command prompt is launched. I don't disagree the output should be redirected to a file of course.
-
Racoon about 11 yearsThis command produces an exception: Generic Exception Handler: System.IO.IOException: The parameter is incorrect. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.__Error.WinIOError() at System.Console.set_OutputEncoding(Encoding value) at tpam_multibyte.Program.Main(String[] args)
-
Sasha about 11 yearsAs I said, it will fail on Windows OS (at least up to Windows 7), because Windows console doesn't support unicode. That's why you are getting that error
-
Racoon about 11 yearsThanks. Since .Net is only intended for Windows world (if we forget about Mono), it means that I can't use this solution. But thanks, anyway.
-
Sasha about 11 yearsPlease read the post to the end. The solution recommended was to write to a file, not to console. By the way, you can use that solution with other encoding if you need so (but currently you don't probably)
-
Jakob Christensen about 11 years@Racoon: Glad I could help :-)
-
Paul over 10 yearsHow to do this with console?
-
Paul over 10 years@Oleksandr Pshenychnyy: Windows console DOES support Unicode. To make you believe install Far Manager and use it with Consolas font.
-
Tushar about 9 years@Paul If you are thinking about doing the same thing using console, you need to change the console code page to UTF-8 equivalent code page using "chcp" command (ex.
chcp 65001
for utf-8) then run the application binary and redirect the standard output to file.