streaming XML serialization in .net

13,775

Solution 1

The XmlWriter class is a fast streaming API for XML generation. It is rather low-level, MSDN has an article on instantiating a validating XmlWriter using XmlWriter.Create().

Edit: link fixed. Here is sample code from the article:

async Task TestWriter(Stream stream) 
{
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Async = true;

    using (XmlWriter writer = XmlWriter.Create(stream, settings)) {
        await writer.WriteStartElementAsync("pf", "root", "http://ns");
        await writer.WriteStartElementAsync(null, "sub", null);
        await writer.WriteAttributeStringAsync(null, "att", null, "val");
        await writer.WriteStringAsync("text");
        await writer.WriteEndElementAsync();
        await writer.WriteCommentAsync("cValue");
        await writer.WriteCDataAsync("cdata value");
        await writer.WriteEndElementAsync();
        await writer.FlushAsync();
    }
}

Solution 2

Here's what I use:

using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
using System.Text;
using System.IO;

namespace Utils
{
    public class XMLSerializer
    {
        public static Byte[] StringToUTF8ByteArray(String xmlString)
        {
            return new UTF8Encoding().GetBytes(xmlString);
        }

        public static String SerializeToXML<T>(T objectToSerialize)
        {
            StringBuilder sb = new StringBuilder();

            XmlWriterSettings settings = 
                new XmlWriterSettings {Encoding = Encoding.UTF8, Indent = true};

            using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
            {
                if (xmlWriter != null)
                {
                    new XmlSerializer(typeof(T)).Serialize(xmlWriter, objectToSerialize);
                }
            }

            return sb.ToString();
        }

        public static void DeserializeFromXML<T>(string xmlString, out T deserializedObject) where T : class
        {
            XmlSerializer xs = new XmlSerializer(typeof (T));

            using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(xmlString)))
            {
                deserializedObject = xs.Deserialize(memoryStream) as T;
            }
        }
    }
}

Then just call:

string xml = Utils.SerializeToXML(myObjectsIEnumerable);

I haven't tried it with, for example, an IEnumerable that fetches objects one at a time remotely, or any other weird use cases, but it works perfectly for List<T> and other collections that are in memory.

EDIT: Based on your comments in response to this, you could use XmlDocument.LoadXml to load the resulting XML string into an XmlDocument, save the first one to a file, and use that as your master XML file. For each item in the IEnumerable, use LoadXml again to create a new in-memory XmlDocument, grab the nodes you want, append them to the master document, and save it again, getting rid of the new one.

After you're finished, there may be a way to wrap all of the nodes in your root tag. You could also use XSL and XslCompiledTransform to write another XML file with the objects properly wrapped in the root tag.

Solution 3

You can do this by implementing the IXmlSerializable interface on the large class. The implementation of the WriteXml method can write the start tag, then simply loop over the IEnumerable<MyObject> and serialize each MyObject to the same XmlWriter, one at a time.

In this implementation, there won't be any in-memory data to get rid of (past what the garbage collector will collect).

Share:
13,775
Luca Martinetti
Author by

Luca Martinetti

Monkey at Playhaven

Updated on June 04, 2022

Comments

  • Luca Martinetti
    Luca Martinetti almost 2 years

    I'm trying to serialize a very large IEnumerable<MyObject> using an XmlSerializer without keeping all the objects in memory.

    The IEnumerable<MyObject> is actually lazy..

    I'm looking for a streaming solution that will:

    1. Take an object from the IEnumerable<MyObject> Serialize it to the underlying stream using the standard serialization (I don't want to handcraft the XML here!)
    2. Discard the in memory data and move to the next

    I'm trying with this code:

    using (var writer = new StreamWriter(filePath))
    {
     var xmlSerializer = new XmlSerializer(typeof(MyObject));
      foreach (var myObject in myObjectsIEnumerable)
      {
       xmlSerializer.Serialize(writer, myObject);
      }
    }
    

    but I'm getting multiple XML headers and I cannot specify a root tag <MyObjects> so my XML is invalid.

    Any idea?

    Thanks