streaming XML serialization in .net
Solution 1
The XmlWriter
class is a fast streaming API for XML generation. It is rather low-level, MSDN has an article on instantiating a validating XmlWriter using XmlWriter.Create()
.
Edit: link fixed. Here is sample code from the article:
async Task TestWriter(Stream stream)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Async = true;
using (XmlWriter writer = XmlWriter.Create(stream, settings)) {
await writer.WriteStartElementAsync("pf", "root", "http://ns");
await writer.WriteStartElementAsync(null, "sub", null);
await writer.WriteAttributeStringAsync(null, "att", null, "val");
await writer.WriteStringAsync("text");
await writer.WriteEndElementAsync();
await writer.WriteCommentAsync("cValue");
await writer.WriteCDataAsync("cdata value");
await writer.WriteEndElementAsync();
await writer.FlushAsync();
}
}
Solution 2
Here's what I use:
using System;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Serialization;
using System.Text;
using System.IO;
namespace Utils
{
public class XMLSerializer
{
public static Byte[] StringToUTF8ByteArray(String xmlString)
{
return new UTF8Encoding().GetBytes(xmlString);
}
public static String SerializeToXML<T>(T objectToSerialize)
{
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings =
new XmlWriterSettings {Encoding = Encoding.UTF8, Indent = true};
using (XmlWriter xmlWriter = XmlWriter.Create(sb, settings))
{
if (xmlWriter != null)
{
new XmlSerializer(typeof(T)).Serialize(xmlWriter, objectToSerialize);
}
}
return sb.ToString();
}
public static void DeserializeFromXML<T>(string xmlString, out T deserializedObject) where T : class
{
XmlSerializer xs = new XmlSerializer(typeof (T));
using (MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(xmlString)))
{
deserializedObject = xs.Deserialize(memoryStream) as T;
}
}
}
}
Then just call:
string xml = Utils.SerializeToXML(myObjectsIEnumerable);
I haven't tried it with, for example, an IEnumerable
that fetches objects one at a time remotely, or any other weird use cases, but it works perfectly for List<T>
and other collections that are in memory.
EDIT: Based on your comments in response to this, you could use XmlDocument.LoadXml
to load the resulting XML string into an XmlDocument
, save the first one to a file, and use that as your master XML file. For each item in the IEnumerable
, use LoadXml
again to create a new in-memory XmlDocument
, grab the nodes you want, append them to the master document, and save it again, getting rid of the new one.
After you're finished, there may be a way to wrap all of the nodes in your root tag. You could also use XSL and XslCompiledTransform
to write another XML file with the objects properly wrapped in the root tag.
Solution 3
You can do this by implementing the IXmlSerializable
interface on the large class. The implementation of the WriteXml
method can write the start tag, then simply loop over the IEnumerable<MyObject>
and serialize each MyObject
to the same XmlWriter
, one at a time.
In this implementation, there won't be any in-memory data to get rid of (past what the garbage collector will collect).
Comments
-
Luca Martinetti almost 2 years
I'm trying to serialize a very large
IEnumerable<MyObject>
using anXmlSerializer
without keeping all the objects in memory.The
IEnumerable<MyObject>
is actually lazy..I'm looking for a streaming solution that will:
- Take an object from the
IEnumerable<MyObject>
Serialize it to the underlying stream using the standard serialization (I don't want to handcraft the XML here!) - Discard the in memory data and move to the next
I'm trying with this code:
using (var writer = new StreamWriter(filePath)) { var xmlSerializer = new XmlSerializer(typeof(MyObject)); foreach (var myObject in myObjectsIEnumerable) { xmlSerializer.Serialize(writer, myObject); } }
but I'm getting multiple XML headers and I cannot specify a root tag
<MyObjects>
so my XML is invalid.Any idea?
Thanks
- Take an object from the