How to best detect encoding in XML file?
10,590
Ok, I should have thought of this earlier. Both XmlTextReader (which gives us the Encoding) and XmlReader.Create (which allows us to specify encoding) accepts a Stream. So how about first opening a FileStream and then use this with both XmlTextReader and XmlReader, like this:
using (var txtreader = new FileStream(filepath, FileMode.Open))
{
using (var xmlreader = new XmlTextReader(txtreader))
{
// Read in the encoding info
xmlreader.MoveToContent();
var encoding = xmlreader.Encoding;
// Rewind to the beginning
txtreader.Seek(0, SeekOrigin.Begin);
var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
encoding);
using (var reader = XmlReader.Create(txtreader, settings, context))
{
return XElement.Load(reader);
}
}
}
This works like a charm. Reading XML files in an encoding independent way should have been more elegant but at least I'm getting away with only one file open.
Author by
Peter Lillevold
I'm a software developer from Norway. I like clean code, industrial metal and good food.
Updated on June 21, 2022Comments
-
Peter Lillevold almost 2 years
To load XML files with arbitrary encoding I have the following code:
Encoding encoding; using (var reader = new XmlTextReader(filepath)) { reader.MoveToContent(); encoding = reader.Encoding; } var settings = new XmlReaderSettings { NameTable = new NameTable() }; var xmlns = new XmlNamespaceManager(settings.NameTable); var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default, encoding); using (var reader = XmlReader.Create(filepath, settings, context)) { return XElement.Load(reader); }
This works, but it seems a bit inefficient to open the file twice. Is there a better way to detect the encoding such that I can do:
- Open file
- Detect encoding
- Read XML into an XElement
- Close file
-
petr k. about 11 yearsWould just calling the XmlReaderCreate(Stream) overload work the same way in terms of detecting the encoding?
-
Peter Lillevold about 11 years@petrk. - I'm using XmlTextReader explicitly since that's the class providing the
Encoding
property. Not sure what else you had in mind? -
petr k. about 11 yearsRight, let me explain. It seems that
XElement.Load(XmlReader.Create(new FileStream(filepath, FileMode.Open)))
should do the some thing (disposing resources omitted for brevity). The documentation for XmlReader.Create(Stream) says: The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding. When encoding is determined, the encoding is used to continue reading the stream, and processing continues parsing the input as a stream of (Unicode) characters. I was wondering if your explicit -
petr k. about 11 yearsencoding detection is any different from what XmlReader.Create(Stream) overload does.
-
Peter Lillevold about 11 years@petrk. interesting... I'm sure I had a situation back then where
XmlReader
alone didn't work and I had to specify the encoding explicitly via the parser context to make it work. I should have recorded more of my scenario here because now I cannot remember all the details :) -
petr k. about 11 yearsI am in the exact same situation, also having something similar to your sample in my codebase. I remember trying a lot of things before getting to that solution, but now it seems I could have just used the most straightforward way instead. Not sure if there's a risk of breaking anything, since I have a lot of code depending on this.
-
Peter Lillevold about 11 years@petrk. - only way to be sure is to build some test cases with files of various encoding.