Convert XDocument to byte array (and byte array to XDocument)

12,716

That sounds like buffer of one of streams / writers was not flushed during read or write - use using (...) for autoclose, flush and dispose, and also check that in all places where you finished read / write you've done .Flush()

Share:
12,716
Owen
Author by

Owen

Updated on June 30, 2022

Comments

  • Owen
    Owen almost 2 years

    I've taken over a system that stores large XML documents in SQL Server in binary format.

    Currently the data is saved by converting it to a string, then converting that string to a byte array. But recently with some large XML documents I'm getting out memory exceptions when attempting to convert to a string, so I want to bypass this process and go straight from the XDocument to a byte array.

    The Entity Framework class holding the XML has been extended so that the binary data is accessible as a string like this:

    partial class XmlData
    {
        public string XmlString { get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
    }
    

    I want to further extend the class to look something like this:

    partial class XmlData
    {
        public string XmlString{ get { return Encoding.UTF8.GetString(XmlBinary); } set { XmlBinary = Encoding.UTF8.GetBytes(value); } }
    
        public XDocument XDoc
        {
            get
            {
                // Convert XmlBinary to XDocument
            }
            set
            {
                // Convert XDocument to XmlBinary
            }
        }
    }
    

    I think I've nearly figured out the conversion, but when I use the partial classes XmlString method to get the XML back from the DB, the XML has always been cut off near the end, always at a different character count:

    var memoryStream = new MemoryStream();
    var xmlWriter = XmlWriter.Create(memoryStream);
    myXDocument.WriteTo(xmlWriter);
    XmlData.XmlBinary = memoryStream.ToArray();
    

    SOLUTION

    Here's the basic conversion:

    var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
    using (var memoryStream = new MemoryStream())
    using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
    {
        myXDocument.WriteTo(xmlWriter);
        xmlWriter.Flush();
        XmlData.XmlBinary = memoryStream.ToArray();
    }
    

    But for some reason in this process, some weird non ascii characters get added to the XML so using my previous XmlString method would load those weird characters and XDocument.Parse() would break, so my new partial class looks like this:

    partial class XmlData
    {
        public string XmlString 
        { 
            get 
            {
                var xml = Encoding.UTF8.GetString(XmlBinary);
                xml = Regex.Replace(xml, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
                return xml;
            } 
            set 
            { 
                value = Regex.Replace(value, @"[^\u0000-\u007F]", string.Empty); // Removes non ascii characters
                XmlBinary = Encoding.UTF8.GetBytes(value); 
            } 
        }
    
        public XDocument XDoc
        {
            get
            {
                using (var memoryStream = new MemoryStream(XmlBinary))
                using (var xmlReader = XmlReader.Create(memoryStream))
                {
                    var xml = XDocument.Load(xmlReader);
                    return xml;
                }
            }
            set
            {
                var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Encoding = Encoding.UTF8 };
                using (var memoryStream = new MemoryStream())
                using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
                {
                    value.WriteTo(xmlWriter);
                    xmlWriter.Flush();
                    XmlBinary = memoryStream.ToArray();
                }
            }
        }
    }