How to convert UTF-8 byte[] to string
Solution 1
string result = System.Text.Encoding.UTF8.GetString(byteArray);
Solution 2
There're at least four different ways doing this conversion.
Encoding's GetString
, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.BitConverter.ToString
The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.Convert.ToBase64String
You can easily convert the output string back to byte array by usingConvert.FromBase64String
.
Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.HttpServerUtility.UrlTokenEncode
You can easily convert the output string back to byte array by usingHttpServerUtility.UrlTokenDecode
. The output string is already URL friendly! The downside is it needsSystem.Web
assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
Solution 3
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
Solution 4
Definition:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
Using:
string result = input.ConvertByteToString();
Solution 5
Converting a byte[]
to a string
seems simple, but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
Tjkoopa
For me, fun is taking a program, language, whatever and making it do things it's designer never envisioned it doing. It comes in handy when I'm asked to do something a little outside the norm as part of my job.
Updated on November 03, 2021Comments
-
Tjkoopa over 2 years
I have a
byte[]
array that is loaded from a file that I happen to known contains UTF-8.In some debugging code, I need to convert it to a string. Is there a one-liner that will do this?
Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.
-
Tom Blodget over 7 years"should be just an allocation and a memcopy": is not correct because a .NET string is UTF-16 encoded. A Unicode character might be one UTF-8 code unit or one UTF-16 code unit. another might be two UTF-8 code units or one UTF-16 code unit, another might be three UTF-8 code units or one UTF-16 code unit, another might be four UTF-8 code units or two UTF-16 code units. A memcopy might be able to widen but it wouldn't be able to handle UTF-8 to UTF-16 conversion.
-
-
drtf almost 10 yearsLINQ it:
var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();
-
maazza almost 9 yearshow does it handle null ended strings ?
-
david.pfx almost 9 yearsBut not UTF-8 methinks?
-
Hi-Angel almost 9 years@maazza for unknown reason it doesn't at all. I'm calling it like
System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');
. -
Luaan over 8 years@Hi-Angel Unknown reason? The only reason null-terminated strings ever became popular was the C language - and even that was only because of a historical oddity (CPU instructions that dealt with null-terminated strings). .NET only uses null-terminated strings when interopping with code that uses null-terminated strings (which are finally disappearing). It's perfectly valid for a string to contain NUL characters. And of course, while null-terminated strings are dead simple in ASCII (just build until you get the first zero byte), other encodings, including UTF-8, are not so simple.
-
plugwash over 8 yearsOne of the beautiful features of UTF-8 is that a shorter sequence is never a subsequence of a longer sequence. So a null terminated UTF-8 string is simple.
-
Erik Bergstedt over 8 yearsI received System.FormatException using your method when I unpacked it with Convert.FromBase64String.
-
Erik Bergstedt over 8 yearsWell, good luck unpacking it if it has non-ascii. Just use Convert.ToBase64String.
-
Nyerguds over 7 years
UnicodeEncoding
is the worst class name ever; unicode isn't an encoding at all. That class is actually UTF-16. The little-endian version, I think. -
CodeCaster over 7 yearsThis converts the byte array to a hexadecimal string representing each byte, which is generally not what you want when converting bytes to a string. If you do, then that's another question, see for example How do you convert Byte Array to Hexadecimal String, and vice versa?.
-
Assimilater almost 7 yearsExample demonstrating this does not terminate with null characters.
Encoding.Ascii
yields same results -
Winter almost 7 yearsNot what OP asked
-
Nyerguds over 6 yearsMine does, actually.
byteArr.TakeWhile(x => x != 0)
is a quick and easy way to solve the null termination problem. -
Sebastian Zander over 6 yearsBut this assumes that there is either an encoding BOM in the byte stream or that it is in UTF-8. But you can do the same with Encoding anyway. It doesn't magically solve the problem when you don't know the encoding.
-
user3841581 over 6 years@ AndrewJE this will take for even to compute if you have a large byte array like the one used from the pictures.
-
Marco Pardo over 5 yearsdidnt have one. But this function is in use for binary transmission in our company-network and so far 20TB were re- and encoded correctly. So for me this function works :)
-
elnaz jangi almost 4 yearsI am very happy to be able to use your knowledge, dear friends. Good luck and thank you for your individual explanations and answers. @Hi-Angel May I ask why did you use TrimEnd ??
-
Hi-Angel almost 4 years@elnazjangi I haven't used C# for a long time, but AFAIR in C# a null byte is a valid element of a string. Not a useful one though, so the
.TrimEnd('\0')
call simply removes these if they're found at the end. Regarding, why it's expected to be there: in C and C++ langs a null byte has special meaning, it marks the end of the string. So if you know you are under circumstances where the string you're getting from the buffer can be a zero-terminated one, you'd use this function call. -
Nyerguds over 3 yearsDon't you need to specifically strip the BOM off the start though? As far as I know, even if you use a UTF8Encoding with BOM, it will not strip that off automatically.
-
Antonio Leonardo about 3 years@Nyerguds, the UTF8Encoding object with "false" value at parameter is without BOM.
-
Nyerguds about 3 yearsNo, I mean, if the text has a BOM, even the
System.Text.Encoding.UTF8
will not automatically strip that off. Try it out. -
dimitar.bogdanov about 3 yearsThis should be the accepted answer. It perfectly illustrates the output of multiple methods. The current accepted answer shows only one, which may be problematic for some developers who don't scroll this far down. - unless you sort by votes, of course.
-
Peter Mortensen over 2 yearsWhat do you mean by "null termination"? Null bytes in the input array? Can you define exactly what you mean in your answer? (But without "Edit:", "Update:", or similar - the answer should appear as if it was written today.)
-
Assimilater over 2 yearsI don't feel the need to edit the answer. In low level systems that use byte arrays for ascii-encoded strings the array itself doesn't contain information about the length of the string. The most common practice is to terminate the string with a value of 0 (aka null). Failing to do so is the cause of the famous buffer overflow exploit. As for this answer specifically, I haven't used c# in a few years so I don't remember if it just wasn't copying the null byte or falling to stop copying until and including the null byte. But that's null termination in a nutshell
-
Assimilater over 2 yearsI think maybe when it was continuing to copy past the null terminator without this code maybe....but again I don't remember
-
Wai Ha Lee over 2 years
GetString
is a static property on theEncoding
class (of which whichASCIIEncoding
is a derived type). This code is the same as usingEncoding.UTF8.GetString
, which is already suggested by numerous other answers. Please don't post duplicate answers. From review -
variable over 2 yearsIf I use this then it returns diamonds in the result. Where as this works:
Convert.ToBase64String
- why is that? -
variable over 2 yearsThis gives me diamonds where as this works
Convert.ToBase64String