How can I detect if a .NET StreamReader found a UTF8 BOM on the underlying stream?
12,127
Solution 1
Rather than hardcoding the bytes, it is prettier to use the API
public string ConvertFromUtf8(byte[] bytes)
{
var enc = new UTF8Encoding(true);
var preamble = enc.GetPreamble();
if (preamble.Where((p, i) => p != bytes[i]).Any())
throw new ArgumentException("Not utf8-BOM");
return enc.GetString(bytes.Skip(preamble.Length).ToArray());
}
Solution 2
You can detect whether the StreamReader
encountered a BOM by initializing it with a BOM-less UTF8 encoding and checking to see if CurrentEncoding
changes after the first read.
var utf8NoBom = new UTF8Encoding(false);
using (var reader = new StreamReader(file, utf8NoBom))
{
reader.Read();
if (Equals(reader.CurrentEncoding, utf8NoBom))
{
Console.WriteLine("No BOM");
}
else
{
Console.WriteLine("BOM detected");
}
}
Author by
bookclub
Updated on June 15, 2022Comments
-
bookclub almost 2 years
I get a
FileStream(filename,FileMode.Open,FileAccess.Read,FileShare.ReadWrite)
and then aStreamReader(stream,true)
.Is there a way I can check if the stream started with a UTF8 BOM? I am noticing that files without the BOM are read as UTF8 by the StreamReader.
How can I tell them apart?
-
Cameron Taggart almost 9 yearsI never would have thought that this would work. Thanks! It is really too bad that the opposite isn't true. You can't pass int UTF8Encoding(true) and have it return UTF8Encoding(false).
-
Martin over 4 years@carlo-v-dango, I'd recommend adding some kind of null-check since bytes may be empty if file is empty.
if (preamble.Where((p, i) => bytes.Length > i && p != bytes[i]).Any())
or whatever floats your boat.