How to convert UTF-8 byte[] to string

c# .net arrays string type-conversion

1,307,213

Solution 1

string result = System.Text.Encoding.UTF8.GetString(byteArray);

Solution 2

There're at least four different ways doing this conversion.

Encoding's GetString
, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString
The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String
You can easily convert the output string back to byte array by using Convert.FromBase64String.
Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncode
You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.

A full example:

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

Solution 3

A general solution to convert from byte array to string when you don't know the encoding:

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

Solution 4

Definition:

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

Using:

string result = input.ConvertByteToString();

Solution 5

Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:

private string ToString(byte[] bytes)
{
    string response = string.Empty;

    foreach (byte b in bytes)
        response += (Char)b;

    return response;
}

View more solutions

1,307,213

Author by

Tjkoopa

For me, fun is taking a program, language, whatever and making it do things it's designer never envisioned it doing. It comes in handy when I'm asked to do something a little outside the norm as part of my job.

Updated on November 03, 2021

Comments

Tjkoopa over 2 years

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.

In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?

Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
- Tom Blodget over 7 years
  
  "should be just an allocation and a memcopy": is not correct because a .NET string is UTF-16 encoded. A Unicode character might be one UTF-8 code unit or one UTF-16 code unit. another might be two UTF-8 code units or one UTF-16 code unit, another might be three UTF-8 code units or one UTF-16 code unit, another might be four UTF-8 code units or two UTF-16 code units. A memcopy might be able to widen but it wouldn't be able to handle UTF-8 to UTF-16 conversion.
drtf almost 10 years

LINQ it: var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();
maazza almost 9 years

how does it handle null ended strings ?
david.pfx almost 9 years

But not UTF-8 methinks?
Hi-Angel almost 9 years

@maazza for unknown reason it doesn't at all. I'm calling it like System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');.
Luaan over 8 years

@Hi-Angel Unknown reason? The only reason null-terminated strings ever became popular was the C language - and even that was only because of a historical oddity (CPU instructions that dealt with null-terminated strings). .NET only uses null-terminated strings when interopping with code that uses null-terminated strings (which are finally disappearing). It's perfectly valid for a string to contain NUL characters. And of course, while null-terminated strings are dead simple in ASCII (just build until you get the first zero byte), other encodings, including UTF-8, are not so simple.
plugwash over 8 years

One of the beautiful features of UTF-8 is that a shorter sequence is never a subsequence of a longer sequence. So a null terminated UTF-8 string is simple.
Erik Bergstedt over 8 years

I received System.FormatException using your method when I unpacked it with Convert.FromBase64String.
Erik Bergstedt over 8 years

Well, good luck unpacking it if it has non-ascii. Just use Convert.ToBase64String.
Nyerguds over 7 years

UnicodeEncoding is the worst class name ever; unicode isn't an encoding at all. That class is actually UTF-16. The little-endian version, I think.
CodeCaster over 7 years

This converts the byte array to a hexadecimal string representing each byte, which is generally not what you want when converting bytes to a string. If you do, then that's another question, see for example How do you convert Byte Array to Hexadecimal String, and vice versa?.
Assimilater almost 7 years

Example demonstrating this does not terminate with null characters. Encoding.Ascii yields same results
Winter almost 7 years

Not what OP asked
Nyerguds over 6 years

Mine does, actually. byteArr.TakeWhile(x => x != 0) is a quick and easy way to solve the null termination problem.
Sebastian Zander over 6 years

But this assumes that there is either an encoding BOM in the byte stream or that it is in UTF-8. But you can do the same with Encoding anyway. It doesn't magically solve the problem when you don't know the encoding.
user3841581 over 6 years

@ AndrewJE this will take for even to compute if you have a large byte array like the one used from the pictures.
Marco Pardo over 5 years

didnt have one. But this function is in use for binary transmission in our company-network and so far 20TB were re- and encoded correctly. So for me this function works :)
elnaz jangi almost 4 years

I am very happy to be able to use your knowledge, dear friends. Good luck and thank you for your individual explanations and answers. @Hi-Angel May I ask why did you use TrimEnd ??
Hi-Angel almost 4 years

@elnazjangi I haven't used C# for a long time, but AFAIR in C# a null byte is a valid element of a string. Not a useful one though, so the .TrimEnd('\0') call simply removes these if they're found at the end. Regarding, why it's expected to be there: in C and C++ langs a null byte has special meaning, it marks the end of the string. So if you know you are under circumstances where the string you're getting from the buffer can be a zero-terminated one, you'd use this function call.
Nyerguds over 3 years

Don't you need to specifically strip the BOM off the start though? As far as I know, even if you use a UTF8Encoding with BOM, it will not strip that off automatically.
Antonio Leonardo about 3 years

@Nyerguds, the UTF8Encoding object with "false" value at parameter is without BOM.
Nyerguds about 3 years

No, I mean, if the text has a BOM, even the System.Text.Encoding.UTF8 will not automatically strip that off. Try it out.
dimitar.bogdanov about 3 years

This should be the accepted answer. It perfectly illustrates the output of multiple methods. The current accepted answer shows only one, which may be problematic for some developers who don't scroll this far down. - unless you sort by votes, of course.
Peter Mortensen over 2 years

What do you mean by "null termination"? Null bytes in the input array? Can you define exactly what you mean in your answer? (But without "Edit:", "Update:", or similar - the answer should appear as if it was written today.)
Assimilater over 2 years

I don't feel the need to edit the answer. In low level systems that use byte arrays for ascii-encoded strings the array itself doesn't contain information about the length of the string. The most common practice is to terminate the string with a value of 0 (aka null). Failing to do so is the cause of the famous buffer overflow exploit. As for this answer specifically, I haven't used c# in a few years so I don't remember if it just wasn't copying the null byte or falling to stop copying until and including the null byte. But that's null termination in a nutshell
Assimilater over 2 years

I think maybe when it was continuing to copy past the null terminator without this code maybe....but again I don't remember
Wai Ha Lee over 2 years

GetString is a static property on the Encoding class (of which which ASCIIEncoding is a derived type). This code is the same as using Encoding.UTF8.GetString, which is already suggested by numerous other answers. Please don't post duplicate answers. From review
variable over 2 years

If I use this then it returns diamonds in the result. Where as this works: Convert.ToBase64String - why is that?
variable over 2 years

This gives me diamonds where as this works Convert.ToBase64String