split xml document into chunks

10,213

Solution 1

Another naive solution; this time for .NET 2.0. It should give you an idea of how to go about what you want. Uses Xpath expressions instead of Linq to XML. Chunks a 100 order docket into 10 dockets in under a second on my devbox.

 public List<XmlDocument> ChunkDocket(XmlDocument docket, int chunkSize)
    {
        List<XmlDocument> newDockets = new List<XmlDocument>();
        //            
        int orderCount = docket.SelectNodes("//docket/order").Count;
        int chunkStart = 0;
        XmlDocument newDocket = null;
        XmlElement root = null;
        XmlNodeList chunk = null;

        while (chunkStart < orderCount)
        {
            newDocket = new XmlDocument();
            root = newDocket.CreateElement("docket");
            newDocket.AppendChild(root);

            chunk = docket.SelectNodes(String.Format("//docket/order[position() > {0} and position() <= {1}]", chunkStart, chunkStart + chunkSize));

            chunkStart += chunkSize;

            XmlNode targetNode = null;
            foreach (XmlNode c in chunk)
            {
                targetNode = newDocket.ImportNode(c, true);
                root.AppendChild(targetNode);
            }

            newDockets.Add(newDocket);
        } 

        return newDockets;
    }

Solution 2

Naive, iterative, but works [EDIT: in .NET 3.5 only]

    public List<XDocument> ChunkDocket(XDocument docket, int chunkSize)
    {
        var newDockets = new List<XDocument>();
        var d = new XDocument(docket);
        var orders = d.Root.Elements("order");
        XDocument newDocket = null;

        do
        {
            newDocket = new XDocument(new XElement("docket"));
            var chunk = orders.Take(chunkSize);
            newDocket.Root.Add(chunk);
            chunk.Remove();
            newDockets.Add(newDocket);
        } while (orders.Any());

        return newDockets;
    }
Share:
10,213
ChrisCa
Author by

ChrisCa

Updated on June 04, 2022

Comments

  • ChrisCa
    ChrisCa almost 2 years

    I have a large xml document that needs to be processed 100 records at a time

    It is being done within a Windows Service written in c#.

    The structure is as follows :

    <docket xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="docket.xsd">
        <order>
            <Date>2008-10-13</Date>
            <orderNumber>050758023</orderNumber>
            <ParcelID/>
            <CustomerName>sddsf</CustomerName>
            <DeliveryName>dsfd</DeliveryName>
            <Address1>sdf</Address1>
            <Address2>sdfsdd</Address2>
            <Address3>sdfdsfdf</Address3>
            <Address4>dffddf</Address4>
            <PostCode/>
    
        </order>
        <order>
            <Date>2008-10-13</Date>
            <orderNumber>050758023</orderNumber>
            <ParcelID/>
            <CustomerName>sddsf</CustomerName>
            <DeliveryName>dsfd</DeliveryName>
            <Address1>sdf</Address1>
            <Address2>sdfsdd</Address2>
            <Address3>sdfdsfdf</Address3>
            <Address4>dffddf</Address4>
            <PostCode/>
    
        </order>
    
        .....
    
        .....
    
    </docket>
    

    There could be thousands of orders in a docket.

    I need to chop this into 100 element chunks

    However each of the 100 orders still need to be wrapped with the parent "docket" node and have the same namespace etc

    is this possible?

  • Jim Burger
    Jim Burger over 15 years
    I know its horribly inefficient.