Why is File.ReadAllBytes result different than when using File.ReadAllText?
Solution 1
As explained in ReadAllText
's documentation:
This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.
So the file contains BOM (Byte order mark), and ReadAllText
method correctly interprets it, while the first method just reads plain bytes, without interpreting them at all.
Encoding.GetString
says that it only:
decodes all the bytes in the specified byte array into a string
(emphasis mine). Which is of course not entirely conclusive, but your example shows that this is to be taken literally.
Solution 2
You are probably seeing the Unicode BOM (byte order mark) at the beginning of the file. File.ReadAllText
knows how to strip this off, but Encoding.UTF8
does not.
Solution 3
It's the UTF8 encoding prefix string. It marks the file as UTF8 encoded. ReadAllText
doesn't return it because it's a parsing instruction.
Dragon
Updated on July 24, 2022Comments
-
Dragon almost 2 years
I have a text file (UTF-8 encoding) with contents "test". I try to get the byte array from this file and convert to string, but it contains one strange character. I use the following code:
var path = @"C:\Users\Tester\Desktop\test\test.txt"; // UTF-8 var bytes = File.ReadAllBytes(path); var contents1 = Encoding.UTF8.GetString(bytes); var contents2 = File.ReadAllText(path); Console.WriteLine(contents1); // result is "?test" Console.WriteLine(contents2); // result is "test"
conents1
is different thancontents2
- why? -
kpull1 over 9 yearsIf you check first character
(int)contents1[0]
you will see that this char is the BOM character. More info: stackoverflow.com/questions/6784799/what-is-this-char-65279 -
Thomas Weller over 2 yearsAll the documentation rubbish... It will not only detect UTF-8 and UTF-32 but also UTF-16