ASCIIEncoding.ASCII.GetBytes() Returning Unexpected Value

17,685

Solution 1

Because \u00c0 is not ASCII ( 0-127 range). As result it is encoded as if it is question mark - ? (0x3F).

See MSDN article on ASCIIEncoding:

ASCIIEncoding corresponds to the Windows code page 20127. Because ASCII is a 7-bit encoding, ASCII characters are limited to the lowest 128 Unicode characters, from U+0000 to U+007F. If you use the default encoder returned by the Encoding.ASCII property or the ASCIIEncoding constructor, characters outside that range are replaced with a question mark (?) before the encoding operation is performed.

Solution 2

It seems that you want a byte sequence that represents a string of Unicode characters. Obviously, the bytes will depend on the encoding. Since you expect C0 to be one of the bytes, it narrows the options down a bit. Here is UTF16LE, which of course is two bytes since \u00c0 completely represents a BMP character:

string s = "\u00C0";
byte[] bytes = Encoding.Unicode.GetBytes(s);
Trace.WriteLine(BitConverter.ToString(bytes));

You should read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

Solution 3

First step: you assing unicode char to string, then you convert it to ASCII (but it is unicode). Then you are trying to convert it back using unicode converter.

The following example do all possibilities to make my response more clear:

    static void Main(string[] args)
    {
        string s = "\u00C0";
        Console.WriteLine(s);
        byte[] bytes = ASCIIEncoding.ASCII.GetBytes(s);
        Console.WriteLine(BitConverter.ToString(bytes));
        Console.WriteLine(ASCIIEncoding.ASCII.GetString(bytes));

        Console.WriteLine("Again");
        bytes = Encoding.UTF8.GetBytes(s);
        Console.WriteLine(BitConverter.ToString(bytes));
        Console.WriteLine(Encoding.UTF8.GetString(bytes));

        Console.ReadLine();
    }

And the output is:

A
3F
?
Again
C3-80
A

Btw the definition of BitConverter.GetBytes is:

Converts the numeric value of each element of a specified array of bytes to its equivalent hexadecimal string representation.

Share:
17,685
Verax
Author by

Verax

Updated on June 04, 2022

Comments

  • Verax
    Verax almost 2 years

    This C# code...

    string s = "\u00C0";
    byte[] bytes = ASCIIEncoding.ASCII.GetBytes(s);
    Trace.WriteLine(BitConverter.ToString(bytes));
    

    produces the following output:

    3F
    

    Why is the output not C0?