Sending a string containing special characters through a TcpClient (byte[])

c# .net encoding tcp special-characters

24,138

Solution 1

Never too late to answer a question I think, hope someone will find answers here.

C# uses 16 bit chars, and ASCII truncates them to 8 bit, to fit in a byte. After some research, I found UTF-8 to be the best encoding for special characters.

//data to send via TCP or any stream/file
byte[] string_to_send = UTF8Encoding.UTF8.GetBytes("amé");

//when receiving, pass the array in this to get the string back
string received_string = UTF8Encoding.UTF8.GetString(message_to_send);

Solution 2

Your problem appears to be the Encoding.ASCII.GetBytes("amé"); and Encoding.ASCII.GetString(buffer); calls, as hinted at by '500 - Internal Server Error' in his comments.

The é character is a multi-byte character which is encoded in UTF-8 with the byte sequence C3 A9. When you use the Encoding.ASCII class to encode and decode, the é character is converted to a question mark since it does not have a direct ASCII encoding. This is true of any character that has no direct coding in ASCII.

Change your code to use Encoding.UTF8.GetBytes() and Encoding.UTF8.GetString() and it should work for you.

24,138

Author by

Philippe Paré

Software developer at Devolutions. Also working on a voxel based RPG. I Like regexes now, for some reason.

Updated on July 09, 2022

Comments

Philippe Paré almost 2 years
I'm trying to send a string containing special characters through a TcpClient (byte[]). Here's an example:
- Client enters "amé" in a textbox
- Client converts string to byte[] using a certain encoding (I've tried all the predefined ones plus some like "iso-8859-1")
- Client sends byte[] through TCP
- Server receives and outputs the string reconverted with the same encoding (to a listbox)
Edit :

I forgot to mention that the resulting string was "am?".

Edit-2 (as requested, here's some code):

@DJKRAZE here's a bit of code :
```
byte[] buffer = Encoding.ASCII.GetBytes("amé");
(TcpClient)server.Client.Send(buffer);
```
On the server side:
```
byte[] buffer = new byte[1024];
Client.Recieve(buffer);
string message = Encoding.ASCII.GetString(buffer);
ListBox1.Items.Add(message);
```
The string that appears in the listbox is "am?"

=== Solution ===
```
Encoding encoding = Encoding.GetEncoding("iso-8859-1");
byte[] message = encoding.GetBytes("babé");
```
Update:

Simply using Encoding.Utf8.GetBytes("ééé"); works like a charm.
Philippe Paré about 11 years

Tried implementing this, not working... I get errors saying that the string is not in base64...
Philippe Paré about 11 years

Alright! Found a way around this huge problem. I'm now using the "iso-8859-1" encoding. Here's a bit of code for anyone interested in the future. Encoding encoding = Encoding.GetEncoding("iso-8859-1"); byte[] message = encoding.GetBytes("babé"); The result server side : "babé" ! Thanks anyways for all the answers :)
Scott Chamberlain over 9 years

You said here that you tried that already and it did not work. What changed?
Tom Blodget over 9 years

No. C#'s char data type holds one UTF-16 code unit, one or two of which encode a Unicode codepoint. UTF-8 encodes a Unicode codepoint in 1 to 4 bytes. It doesn't matter which encoding you use as long as you use the same on both sides and the encoding does not cause you to loose data by not being able to represent the characters you need. If it can't, GetBytes() will take some action. The standard action is to substitute "?"; Throwing an exception is also common; Truncation is not common but you could code it that way if you wanted to cause data corruption.
Philippe Paré over 9 years

Scott, clearly I had something else wrong about the code. Utf-8 encoding works perfectly when used on both sides. I updated the question so that people don't get misled with me saying the utf-8 doesn't work.
Philippe Paré over 9 years

Tom, what I meant to say is that however C# stores the char itself, it's 2 bytes and therefor, ascii doesn't help with spécial characters like "é"
Scott Chamberlain over 9 years

@PhilippeParé and what Tom is saying is C# uses UTF-16 internally which could be 2 or 4 bytes in size. For example U+1D11E (MUSICAL SYMBOL G CLEF) is representable but it would be the four bytes D8 34 DD 1E in memory.
Philippe Paré over 9 years

that's interesting! never saws that happen, I guess it would store all chars as 4 bytes when only one of the chars in the string uses let'S say I+1D11E ?