XDocument: saving XML to file without BOM

39,262

Solution 1

Use an XmlTextWriter and pass that to the XDocument's Save() method, that way you can have more control over the type of encoding used:

var doc = new XDocument(
    new XDeclaration("1.0", "utf-8", null),
    new XElement("root", new XAttribute("note", "boogers"))
);
using (var writer = new XmlTextWriter(".\\boogers.xml", new UTF8Encoding(false)))
{
    doc.Save(writer);
}

The UTF8Encoding class constructor has an overload that specifies whether or not to use the BOM (Byte Order Mark) with a boolean value, in your case false.

The result of this code was verified using Notepad++ to inspect the file's encoding.

Solution 2

First of all: the service provider MUST handle it, according to XML spec, which states that BOM may be present in case of UTF-8 representation.

You can force to save your XML without BOM like this:

XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = new UTF8Encoding(false); // The false means, do not emit the BOM.
using (XmlWriter w = XmlWriter.Create("my.xml", settings))
{
    doc.Save(w);
}

(Googled from here: http://social.msdn.microsoft.com/Forums/en/xmlandnetfx/thread/ccc08c65-01d7-43c6-adf3-1fc70fdb026a)

Share:
39,262
systempuntoout
Author by

systempuntoout

I'm a software architect, living and working in Italy. My Google App Engine project: - StackPrinter

Updated on December 03, 2020

Comments

  • systempuntoout
    systempuntoout over 3 years

    I'm generating an utf-8 XML file using XDocument.

    XDocument xml_document = new XDocument(
                        new XDeclaration("1.0", "utf-8", null),
                        new XElement(ROOT_NAME,                    
                        new XAttribute("note", note)
                    )
                );
    ...
    xml_document.Save(@file_path);
    

    The file is generated correctly and validated with an xsd file with success.

    When I try to upload the XML file to an online service, the service says that my file is wrong at line 1; I have discovered that the problem is caused by the BOM on the first bytes of the file.

    Do you know why the BOM is appended to the file and how can I save the file without it?

    As stated in Byte order mark Wikipedia article:

    While Unicode standard allows BOM in UTF-8 it does not require or recommend it. Byte order has no meaning in UTF-8 so a BOM only serves to identify a text stream or file as UTF-8 or that it was converted from another format that has a BOM

    Is it an XDocument problem or should I contact the guys of the online service provider to ask for a parser upgrade?

  • systempuntoout
    systempuntoout about 13 years
    BOM may be present in case of UTF-8 representation can you point me to that specific document?
  • Dercsár
    Dercsár about 13 years
    Here you go: w3.org/TR/2006/REC-xml-20060816/#charencoding First paragraph: "All XML processors MUST be able to read entities in both the UTF-8 and UTF-16 encodings." UTF-8 encoding enables (though not requires) BOM (see Joe's comment below), therefore XML processors must be able process UTF-8 files with BOM.
  • Quick Joe Smith
    Quick Joe Smith about 13 years
    "While Unicode standard allows BOM in UTF-8, it does not require or recommend it. Byte order has no meaning in UTF-8" - en.wikipedia.org/wiki/Byte_order_mark
  • systempuntoout
    systempuntoout about 13 years
    When you open it with Notepad++ is it still in utf-8 even using new UTF8Encoding(false)?
  • Quick Joe Smith
    Quick Joe Smith about 13 years
    I thought you wanted it in UTF-8, just without the BOM?
  • systempuntoout
    systempuntoout about 13 years
    yep, that's correct. I was just asking if new UTF8Encoding(false) could have some other implication.
  • Quick Joe Smith
    Quick Joe Smith about 13 years
    Nope, the boolean value passed to the UTF8Encoding's constructor simply controls whether it includes a BOM. true to include, false to omit.
  • littlebroccoli
    littlebroccoli about 10 years
    Consider adding writer.Formatting = Formatting.Indented;
  • Quick Joe Smith
    Quick Joe Smith about 10 years
    Kevin, that would depend entirely on whether the file was intended to be viewed by humans, otherwise it's just wasted bytes. The question did not provide enough details to presume either way.
  • Stéphane Gourichon
    Stéphane Gourichon almost 8 years
    Warning: Dercsár's solution is better. "Starting with the .NET Framework 2.0, we recommend that you create XmlWriter instances by using the XmlWriter.Create method and the XmlWriterSettings class to take advantage of new functionality.". Source: XmlTextWriter Constructor (String, Encoding) (System.Xml)
  • Stéphane Gourichon
    Stéphane Gourichon almost 8 years
    Warning: doing this instead of just doc.Save(filename) has a side-effect: everything is written on one line. If you'd like your file to remain human-readable, consider adding settings.Indent = true; in this answer's code.
  • SvenL
    SvenL almost 6 years
    Just use XmlWriter.Create with the XmlWriterSettings.Indent = true;. Here you can format your output just as you see fit.
  • Gert Arnold
    Gert Arnold over 2 years
    As already answered multiple times and its also vary vague why you're using a StringBuilder here while the document can save itself, as shown in other answers. Explain.