C#: bytes to UTF-8 string conversion. Why doesn't it work?

14,592

Solution 1

You need to write to a file using UTF8. The code below shows how you may do it. When opening the resulting file in Notepad, the character 𤭢 is shown correctly:

string c = "𤭢";
var bytes = Encoding.UTF8.GetBytes(c);
var cBack = Encoding.UTF8.GetString(bytes);
using (var writer = new StreamWriter(@"c:\temp\char.txt", false, Encoding.UTF8))
{
    writer.WriteLine(cBack);
}

Solution 2

Console can't display Unicode characters by default. It displays only ASCII. To enable it display Unicode, use:

Console.OutputEncoding = System.Text.Encoding.Unicode

before writing to it.

But anyway it will fail on most OS, because Windows Command line doesn't support Unicode itself.

So, for testing purpose it would be better to write output to file

Share:
14,592
Racoon
Author by

Racoon

Updated on June 08, 2022

Comments

  • Racoon
    Racoon almost 2 years

    There is a Chinese character 𤭢 which is presented in UTF-8 as F0 A4 AD A2. This character is described here: http://en.wikipedia.org/wiki/UTF-8

    𤭢 U+24B62 F0 A4 AD A2

    When I run this code in C# ...

    byte[] data = { 0xF0, 0xA4, 0xAD, 0xA2 };
    string abc = Encoding.UTF8.GetString(data);
    Console.WriteLine("Test: description = {0}", abc);
    

    ... I redirect the output to the text file and then open it with notepad.exe choosing UTF-8 encoding. I expect to get 𤭢 in the output, but do get two question marks (??).

    The byte sequence is right. It works in Perl:

    print "\xF0\xA4\xAD\xA2";
    

    In the output, I get 𤭢

    So my question is: why do I get "??" instead of "𤭢" in C#?

    P.S. Nothing special with this character: I got the same thing for any character (2, 3 or 4 byte long).

  • Security Hound
    Security Hound about 11 years
    If it possible to set the Encoding on a console application it should be possible to set the Encoding when a command prompt is launched. I don't disagree the output should be redirected to a file of course.
  • Racoon
    Racoon about 11 years
    This command produces an exception: Generic Exception Handler: System.IO.IOException: The parameter is incorrect. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.__Error.WinIOError() at System.Console.set_OutputEncoding(Encoding value) at tpam_multibyte.Program.Main(String[] args)
  • Sasha
    Sasha about 11 years
    As I said, it will fail on Windows OS (at least up to Windows 7), because Windows console doesn't support unicode. That's why you are getting that error
  • Racoon
    Racoon about 11 years
    Thanks. Since .Net is only intended for Windows world (if we forget about Mono), it means that I can't use this solution. But thanks, anyway.
  • Sasha
    Sasha about 11 years
    Please read the post to the end. The solution recommended was to write to a file, not to console. By the way, you can use that solution with other encoding if you need so (but currently you don't probably)
  • Jakob Christensen
    Jakob Christensen about 11 years
    @Racoon: Glad I could help :-)
  • Paul
    Paul over 10 years
    How to do this with console?
  • Paul
    Paul over 10 years
    @Oleksandr Pshenychnyy: Windows console DOES support Unicode. To make you believe install Far Manager and use it with Consolas font.
  • Tushar
    Tushar about 9 years
    @Paul If you are thinking about doing the same thing using console, you need to change the console code page to UTF-8 equivalent code page using "chcp" command (ex. chcp 65001 for utf-8) then run the application binary and redirect the standard output to file.