How to convert UTF-8 byte[] to string

1,307,213

Solution 1

string result = System.Text.Encoding.UTF8.GetString(byteArray);

Solution 2

There're at least four different ways doing this conversion.

  1. Encoding's GetString
    , but you won't be able to get the original bytes back if those bytes have non-ASCII characters.

  2. BitConverter.ToString
    The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.

  3. Convert.ToBase64String
    You can easily convert the output string back to byte array by using Convert.FromBase64String.
    Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.

  4. HttpServerUtility.UrlTokenEncode
    You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.

A full example:

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

Solution 3

A general solution to convert from byte array to string when you don't know the encoding:

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

Solution 4

Definition:

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

Using:

string result = input.ConvertByteToString();

Solution 5

Converting a byte[] to a string seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:

private string ToString(byte[] bytes)
{
    string response = string.Empty;

    foreach (byte b in bytes)
        response += (Char)b;

    return response;
}
Share:
1,307,213
Tjkoopa
Author by

Tjkoopa

For me, fun is taking a program, language, whatever and making it do things it's designer never envisioned it doing. It comes in handy when I'm asked to do something a little outside the norm as part of my job.

Updated on November 03, 2021

Comments

  • Tjkoopa
    Tjkoopa over 2 years

    I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.

    In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?

    Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.

    • Tom Blodget
      Tom Blodget over 7 years
      "should be just an allocation and a memcopy": is not correct because a .NET string is UTF-16 encoded. A Unicode character might be one UTF-8 code unit or one UTF-16 code unit. another might be two UTF-8 code units or one UTF-16 code unit, another might be three UTF-8 code units or one UTF-16 code unit, another might be four UTF-8 code units or two UTF-16 code units. A memcopy might be able to widen but it wouldn't be able to handle UTF-8 to UTF-16 conversion.
  • drtf
    drtf almost 10 years
    LINQ it: var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();
  • maazza
    maazza almost 9 years
    how does it handle null ended strings ?
  • david.pfx
    david.pfx almost 9 years
    But not UTF-8 methinks?
  • Hi-Angel
    Hi-Angel almost 9 years
    @maazza for unknown reason it doesn't at all. I'm calling it like System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');.
  • Luaan
    Luaan over 8 years
    @Hi-Angel Unknown reason? The only reason null-terminated strings ever became popular was the C language - and even that was only because of a historical oddity (CPU instructions that dealt with null-terminated strings). .NET only uses null-terminated strings when interopping with code that uses null-terminated strings (which are finally disappearing). It's perfectly valid for a string to contain NUL characters. And of course, while null-terminated strings are dead simple in ASCII (just build until you get the first zero byte), other encodings, including UTF-8, are not so simple.
  • plugwash
    plugwash over 8 years
    One of the beautiful features of UTF-8 is that a shorter sequence is never a subsequence of a longer sequence. So a null terminated UTF-8 string is simple.
  • Erik Bergstedt
    Erik Bergstedt over 8 years
    I received System.FormatException using your method when I unpacked it with Convert.FromBase64String.
  • Erik Bergstedt
    Erik Bergstedt over 8 years
    Well, good luck unpacking it if it has non-ascii. Just use Convert.ToBase64String.
  • Nyerguds
    Nyerguds over 7 years
    UnicodeEncoding is the worst class name ever; unicode isn't an encoding at all. That class is actually UTF-16. The little-endian version, I think.
  • CodeCaster
    CodeCaster over 7 years
    This converts the byte array to a hexadecimal string representing each byte, which is generally not what you want when converting bytes to a string. If you do, then that's another question, see for example How do you convert Byte Array to Hexadecimal String, and vice versa?.
  • Assimilater
    Assimilater almost 7 years
    Example demonstrating this does not terminate with null characters. Encoding.Ascii yields same results
  • Winter
    Winter almost 7 years
    Not what OP asked
  • Nyerguds
    Nyerguds over 6 years
    Mine does, actually. byteArr.TakeWhile(x => x != 0) is a quick and easy way to solve the null termination problem.
  • Sebastian Zander
    Sebastian Zander over 6 years
    But this assumes that there is either an encoding BOM in the byte stream or that it is in UTF-8. But you can do the same with Encoding anyway. It doesn't magically solve the problem when you don't know the encoding.
  • user3841581
    user3841581 over 6 years
    @ AndrewJE this will take for even to compute if you have a large byte array like the one used from the pictures.
  • Marco Pardo
    Marco Pardo over 5 years
    didnt have one. But this function is in use for binary transmission in our company-network and so far 20TB were re- and encoded correctly. So for me this function works :)
  • elnaz jangi
    elnaz jangi almost 4 years
    I am very happy to be able to use your knowledge, dear friends. Good luck and thank you for your individual explanations and answers. @Hi-Angel May I ask why did you use TrimEnd ??
  • Hi-Angel
    Hi-Angel almost 4 years
    @elnazjangi I haven't used C# for a long time, but AFAIR in C# a null byte is a valid element of a string. Not a useful one though, so the .TrimEnd('\0') call simply removes these if they're found at the end. Regarding, why it's expected to be there: in C and C++ langs a null byte has special meaning, it marks the end of the string. So if you know you are under circumstances where the string you're getting from the buffer can be a zero-terminated one, you'd use this function call.
  • Nyerguds
    Nyerguds over 3 years
    Don't you need to specifically strip the BOM off the start though? As far as I know, even if you use a UTF8Encoding with BOM, it will not strip that off automatically.
  • Antonio Leonardo
    Antonio Leonardo about 3 years
    @Nyerguds, the UTF8Encoding object with "false" value at parameter is without BOM.
  • Nyerguds
    Nyerguds about 3 years
    No, I mean, if the text has a BOM, even the System.Text.Encoding.UTF8 will not automatically strip that off. Try it out.
  • dimitar.bogdanov
    dimitar.bogdanov about 3 years
    This should be the accepted answer. It perfectly illustrates the output of multiple methods. The current accepted answer shows only one, which may be problematic for some developers who don't scroll this far down. - unless you sort by votes, of course.
  • Peter Mortensen
    Peter Mortensen over 2 years
    What do you mean by "null termination"? Null bytes in the input array? Can you define exactly what you mean in your answer? (But without "Edit:", "Update:", or similar - the answer should appear as if it was written today.)
  • Assimilater
    Assimilater over 2 years
    I don't feel the need to edit the answer. In low level systems that use byte arrays for ascii-encoded strings the array itself doesn't contain information about the length of the string. The most common practice is to terminate the string with a value of 0 (aka null). Failing to do so is the cause of the famous buffer overflow exploit. As for this answer specifically, I haven't used c# in a few years so I don't remember if it just wasn't copying the null byte or falling to stop copying until and including the null byte. But that's null termination in a nutshell
  • Assimilater
    Assimilater over 2 years
    I think maybe when it was continuing to copy past the null terminator without this code maybe....but again I don't remember
  • Wai Ha Lee
    Wai Ha Lee over 2 years
    GetString is a static property on the Encoding class (of which which ASCIIEncoding is a derived type). This code is the same as using Encoding.UTF8.GetString, which is already suggested by numerous other answers. Please don't post duplicate answers. From review
  • variable
    variable over 2 years
    If I use this then it returns diamonds in the result. Where as this works: Convert.ToBase64String - why is that?
  • variable
    variable over 2 years
    This gives me diamonds where as this works Convert.ToBase64String