Force XDocument to write to String with UTF-8 encoding

45,256

Solution 1

Try this:

using System;
using System.IO;
using System.Text;
using System.Xml.Linq;

class Test
{
    static void Main()
    {
        XDocument doc = XDocument.Load("test.xml",
                                       LoadOptions.PreserveWhitespace);
        doc.Declaration = new XDeclaration("1.0", "utf-8", null);
        StringWriter writer = new Utf8StringWriter();
        doc.Save(writer, SaveOptions.None);
        Console.WriteLine(writer);
    }

    private class Utf8StringWriter : StringWriter
    {
        public override Encoding Encoding { get { return Encoding.UTF8; } }
    }
}

Of course, you haven't shown us how you're building the document, which makes it hard to test... I've just tried with a hand-constructed XDocument and that contains the relevant whitespace too.

Solution 2

Try XmlWriterSettings:

XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = false;
xws.Indent = true;

And pass it on like

using (XmlWriter xw = XmlWriter.Create(sb, xws))
Share:
45,256
Chris
Author by

Chris

Updated on September 05, 2020

Comments

  • Chris
    Chris over 3 years

    I want to be able to write XML to a String with the declaration and with UTF-8 encoding. This seems mighty tricky to accomplish.

    I have read around a bit and tried some of the popular answers for this but the they all have issues. My current code correctly outputs as UTF-8 but does not maintain the original formatting of the XDocument (i.e. indents / whitespace)!

    Can anyone offer some advice please?

    XDocument xml = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), xelementXML);
    
    MemoryStream ms = new MemoryStream();
    using (XmlWriter xw = new XmlTextWriter(ms, Encoding.UTF8))
    {
        xml.Save(xw);
        xw.Flush();
    
        StreamReader sr = new StreamReader(ms);
        ms.Seek(0, SeekOrigin.Begin);
    
        String xmlString = sr.ReadToEnd();
    }
    

    The XML requires the formatting to be identical to the way .ToString() would format it i.e.

    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <root>
        <node>blah</node>
    </root>
    

    What I'm currently seeing is

    <?xml version="1.0" encoding="utf-8" standalone="yes"?><root><node>blah</node></root>
    

    Update I have managed to get this to work by adding XmlTextWriter settings... It seems VERY clunky though!

    MemoryStream ms = new MemoryStream();
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Encoding = Encoding.UTF8;
    settings.ConformanceLevel = ConformanceLevel.Document;
    settings.Indent = true;
    using (XmlWriter xw = XmlTextWriter.Create(ms, settings))
    {
        xml.Save(xw);
        xw.Flush();
    
        StreamReader sr = new StreamReader(ms);
        ms.Seek(0, SeekOrigin.Begin);
        String blah = sr.ReadToEnd();
    }
    
  • Chris
    Chris over 13 years
    Works a treat, thanks - is there no way to get the encoding sorted without inheriting from StringWriter?
  • Jon Skeet
    Jon Skeet over 13 years
    @Chris: It's possible that there is some way of getting the TextWriter overload to ignore the encoding that the TextWriter advertises, but I've found this to be a really simple hack to get the job done. (You only need it in one place...)
  • Chris
    Chris over 13 years
    Yeah I like it - it's FAR better than the method I came up with. Thanks