Comparing two XML files & generating a third with XMLDiff in C#

49,758

Solution 1

Okay... I finally opted with a pure C# solution to compare the two XML files, without using the XML Diff/Patch .dll and without even needing to use XSL transforms. I will be needing XSL transforms in the next step though, to convert the Xml into HTML for viewing purposes, but I have figured an algorithm using nothing but System.Xml and System.Xml.XPath.

Here is my algorithm:

private void CompareXml(string file1, string file2)
{
    // Load the documents
    XmlDocument docXml1 = new XmlDocument();
    docXml1.Load(file1);
    XmlDocument docXml2 = new XmlDocument();
    docXml2.Load(file2);


    // Get a list of all player nodes
    XmlNodeList nodes1 = docXml1.SelectNodes("/Stats/Player");
    XmlNodeList nodes2 = docXml2.SelectNodes("/Stats/Player");

    // Define a single node
    XmlNode node1;
    XmlNode node2;

    // Get the root Xml element
    XmlElement root1 = docXml1.DocumentElement;
    XmlElement root2 = docXml2.DocumentElement;

    // Get a list of all player names
    XmlNodeList nameList1 = root1.GetElementsByTagName("Name");
    XmlNodeList nameList2 = root2.GetElementsByTagName("Name");

    // Get a list of all teams
    XmlNodeList teamList1 = root1.GetElementsByTagName("Team");
    XmlNodeList teamList2 = root2.GetElementsByTagName("Team");

    // Create an XmlWriterSettings object with the correct options. 
    XmlWriter writer = null;
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.IndentChars = ("  ");
    settings.OmitXmlDeclaration = false;

    // Create the XmlWriter object and write some content.
    writer = XmlWriter.Create(StatsFile.XmlDiffFilename, settings);
    writer.WriteStartElement("StatsDiff");


    // The compare algorithm
    bool match = false;
    int j = 0;

    try 
    {
        // the list has 500 players
        for (int i = 0; i < 500; i++)
        {
            while (j < 500 && match == false)
            {
                // There is a match if the player name and team are the same in both lists
                if (nameList1.Item(i).InnerText == nameList2.Item(j).InnerText)
                {
                    if (teamList1.Item(i).InnerText == teamList2.Item(j).InnerText)
                    {
                        match = true;
                        node1 = nodes1.Item(i);
                        node2 = nodes2.Item(j);
                        // Call to the calculator and Xml writer
                        this.CalculateDifferential(node1, node2, writer);
                        j = 0;
                    }
                }
                else
                {
                    j++;
                }
            }
            match = false;

        }
        // end Xml document
        writer.WriteEndElement();
        writer.Flush();
    }
    finally
    {
        if (writer != null)
            writer.Close();
    }
}

XML Results:

<?xml version="1.0" encoding="utf-8"?>
<StatsDiff>    
  <Player Rank="1">
    <Name>Sidney Crosby</Name>
    <Team>PIT</Team>
    <Pos>C</Pos>
    <GP>0</GP>
    <G>0</G>
    <A>0</A>
    <Points>0</Points>
    <PlusMinus>0</PlusMinus>
    <PIM>0</PIM>
    <PP>0</PP>
    <SH>0</SH>
    <GW>0</GW>
    <OT>0</OT>
    <Shots>0</Shots>
    <ShotPctg>0</ShotPctg>
    <ShiftsPerGame>0</ShiftsPerGame>
    <FOWinPctg>0</FOWinPctg>
  </Player>

  <Player Rank="2">
    <Name>Steven Stamkos</Name>
    <Team>TBL</Team>
    <Pos>C</Pos>
    <GP>1</GP>
    <G>0</G>
    <A>0</A>
    <Points>0</Points>
    <PlusMinus>0</PlusMinus>
    <PIM>2</PIM>
    <PP>0</PP>
    <SH>0</SH>
    <GW>0</GW>
    <OT>0</OT>
    <Shots>4</Shots>
    <ShotPctg>-0,6000004</ShotPctg>
    <ShiftsPerGame>-0,09999847</ShiftsPerGame>
    <FOWinPctg>0,09999847</FOWinPctg>
  </Player>
[...]
</StatsDiff>

I have spared to show the implementation for the CalculateDifferential() method, it is rather cryptic but it is fast and efficient. This way I could obtain the results wanted without using any other reference but the strict minimum, without having to use XSL...

Solution 2

There are two immediate solutions:

Solution 1.

You can first apply a simple transform to the two documents that will delete the elements that should not be compared. Then, compare the results ing two documents -- exactly with your current code. Here is the transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="Name|Team|Pos"/>
</xsl:stylesheet>

When this transformation is applied to the provided XML document:

<Stats Date="2011-01-01">
    <Player Rank="1">
        <Name>Sidney Crosby</Name>
        <Team>PIT</Team>
        <Pos>C</Pos>
        <GP>39</GP>
        <G>32</G>
        <A>33</A>
        <PlusMinus>20</PlusMinus>
        <PIM>29</PIM>
        <PP>10</PP>
        <SH>1</SH>
        <GW>3</GW>
        <Shots>0</Shots>
        <ShotPctg>154</ShotPctg>
        <TOIPerGame>20.8</TOIPerGame>
        <ShiftsPerGame>21:54</ShiftsPerGame>
        <FOWinPctg>22.6</FOWinPctg>
    </Player>
</Stats>

the wanted resulting document is produced:

<Stats Date="2011-01-01">
   <Player Rank="1">
      <GP>39</GP>
      <G>32</G>
      <A>33</A>
      <PlusMinus>20</PlusMinus>
      <PIM>29</PIM>
      <PP>10</PP>
      <SH>1</SH>
      <GW>3</GW>
      <Shots>0</Shots>
      <ShotPctg>154</ShotPctg>
      <TOIPerGame>20.8</TOIPerGame>
      <ShiftsPerGame>21:54</ShiftsPerGame>
      <FOWinPctg>22.6</FOWinPctg>
   </Player>
</Stats>

Solution 2.

This is a complete XSLT 1.0 solution (for convenience only, the second XML document is embedded in the transformation code):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vrtfDoc2">
  <Stats Date="2011-01-01">
    <Player Rank="2">
        <Name>John Smith</Name>
        <Team>NY</Team>
        <Pos>D</Pos>
        <GP>38</GP>
        <G>32</G>
        <A>33</A>
        <PlusMinus>15</PlusMinus>
        <PIM>29</PIM>
        <PP>10</PP>
        <SH>1</SH>
        <GW>4</GW>
        <Shots>0</Shots>
        <ShotPctg>158</ShotPctg>
        <TOIPerGame>20.8</TOIPerGame>
        <ShiftsPerGame>21:54</ShiftsPerGame>
        <FOWinPctg>22.6</FOWinPctg>
    </Player>
  </Stats>
 </xsl:variable>

 <xsl:variable name="vDoc2" select=
  "document('')/*/xsl:variable[@name='vrtfDoc2']/*"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:param name="pDoc2"/>
  <xsl:copy>
   <xsl:apply-templates select="node()|@*">
    <xsl:with-param name="pDoc2" select="$pDoc2"/>
   </xsl:apply-templates>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <xsl:apply-templates select="*">
   <xsl:with-param name="pDoc2" select="$vDoc2"/>
  </xsl:apply-templates>

  -----------------------

  <xsl:apply-templates select="$vDoc2">
   <xsl:with-param name="pDoc2" select="/*"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="Player/*">
  <xsl:param name="pDoc2"/>
  <xsl:if test=
   "not(. = $pDoc2/*/*[name()=name(current())])">
   <xsl:call-template name="identity"/>
  </xsl:if>
 </xsl:template>

 <xsl:template match="Name|Team|Pos" priority="20"/>
</xsl:stylesheet>

when this transformation is applied on the same first document as above, the correct diffgrams are produced:

<Stats Date="2011-01-01">
   <Player Rank="1">
      <GP>39</GP>
      <PlusMinus>20</PlusMinus>
      <GW>3</GW>
      <ShotPctg>154</ShotPctg>
   </Player>
</Stats>

  -----------------------

  <Stats xmlns:xsl="http://www.w3.org/1999/XSL/Transform" Date="2011-01-01">
   <Player Rank="2">
      <GP>38</GP>
      <PlusMinus>15</PlusMinus>
      <GW>4</GW>
      <ShotPctg>158</ShotPctg>
   </Player>
</Stats>

How this works:

  1. The transformation is applied on the first document, passing the second document as parameter.

  2. This produces an XML document whose only leaf element nodes are the ones that have different value than the corresponding leaf element nodes in the second document.

  3. The same processing is performed as in 1. above, but this time on the second document, passing the first document as parameter.

  4. This produces a second diffgram: an XML document whose only leaf element nodes are the ones that have different value** than the corresponding leaf element nodes in the first document

Share:
49,758

Related videos on Youtube

JF Beaulieu
Author by

JF Beaulieu

Full Stack C# .NET / Typescript / Javascript Web Developer Microsoft MCSD Web Applications certified developer Interested in modern web technologies such as: .NET Framework 4.7.2 C# 8.0 ASP.NET Core 3.1 ASP.NET MVC 5 Entity Framework 7 SQL Server 2016 RESTful Web Services WCF Services (SoA) jQuery React Angular AngularJS Bootstrap SignalR

Updated on July 05, 2022

Comments

  • JF Beaulieu
    JF Beaulieu almost 2 years

    I am trying to write a simple algorithm to read two XML files with the exact same nodes and structure but not necessarily the same data inside the child nodes and not the same order. How could I create a simple implementation for creating a third, temporary XML being the differential between the two first ones, using Microsoft's XML Diff .DLL ?

    XML Diff on MSDN:

    XML Diff and Patch Tool

    XML Diff and Patch GUI Tool

    sample XML code of the two different XML files to compare:

    <?xml version="1.0" encoding="utf-8" ?> 
    <Stats Date="2011-01-01">
     <Player Rank="1">
      <Name>Sidney Crosby</Name> 
      <Team>PIT</Team> 
      <Pos>C</Pos> 
      <GP>39</GP> 
      <G>32</G> 
      <A>33</A> 
      <PlusMinus>20</PlusMinus> 
      <PIM>29</PIM> 
     </Player>
    </Stats>
    
    <?xml version="1.0" encoding="utf-8" ?> 
    <Stats Date="2011-01-10">
     <Player Rank="1">
      <Name>Sidney Crosby</Name> 
      <Team>PIT</Team> 
      <Pos>C</Pos> 
      <GP>42</GP> 
      <G>35</G> 
      <A>34</A> 
      <PlusMinus>22</PlusMinus> 
      <PIM>30</PIM> 
     </Player>
    </Stats>
    

    Result wanted (difference between the two)

    <?xml version="1.0" encoding="utf-8" ?> 
    <Stats Date="2011-01-10">
     <Player Rank="1">
      <Name>Sidney Crosby</Name> 
      <Team>PIT</Team> 
      <Pos>C</Pos> 
      <GP>3</GP> 
      <G>3</G> 
      <A>1</A> 
      <PlusMinus>2</PlusMinus> 
      <PIM>1</PIM> 
     </Player>
    </Stats>
    

    In this case, I would probably use XSLT to convert the resulting XML "differential" file into a sorted HTML file, but I am not there yet. All I want to do is to display in the third XML file the difference of every numerical value of each nodes, starting from the "GP" child-node.

    C# code I have so far:

    private void CompareXml(string file1, string file2)
    {
    
        XmlReader reader1 = XmlReader.Create(new StringReader(file1));
        XmlReader reader2 = XmlReader.Create(new StringReader(file2));
    
        string diffFile = StatsFile.XmlDiffFilename;
        StringBuilder differenceStringBuilder = new StringBuilder();
    
        FileStream fs = new FileStream(diffFile, FileMode.Create);
        XmlWriter diffGramWriter = XmlWriter.Create(fs);
    
        XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder |
                                XmlDiffOptions.IgnoreNamespaces |
                                XmlDiffOptions.IgnorePrefixes);
        bool bIdentical = xmldiff.Compare(file1, file2, false, diffGramWriter);
    
        diffGramWriter.Close();
    
        // cleaning up after we are done with the xml diff file
        File.Delete(diffFile);
    }
    

    That's what I have so far, but the results is garbage... note that for each "Player" node, the first three childs have NOT to be compared... How can I implement this?

    • Dimitre Novatchev
      Dimitre Novatchev over 13 years
      Good question, +1. See my answer for two solutions: one with an auxiliary XSLT transformation to create two new XML documents having only the elements that should be compared, the other solution is completely XSLT. :)
  • JF Beaulieu
    JF Beaulieu over 13 years
    Great solution... How would I go by passing the second document as a parameter without embedding it in the xsl transformation code?
  • JF Beaulieu
    JF Beaulieu over 13 years
    P.S.: I have modified my initial post with a more in detail look at what the product Xml file needs to be in function of the two first ones. I Have never experimented with Xsl, but I have succeeded in applying a first transform to the two XML documents. I can still keep track of which player I am manipulating because of the "Rank" attribute in the "Player" node. Now, I can't figure out how to implement solution 2. using XSL and C#...
  • JF Beaulieu
    JF Beaulieu over 13 years
    P.P.S.: A problem here would be that after doing the XSL transform of the two Xml documents, there will be no way to identify which player is which because the ranks of the player might change. The only way to match two players is by matching : 1. their name and 2. their team, but those fields have been deleted after the transform. The only way to identify the players this way would be to refer to the two previous Xml documents and match the rank with the rank in the transformed Xml docs.. phhhheeeewww!!!
  • Dimitre Novatchev
    Dimitre Novatchev over 13 years
    @Kaeso: You can put identifying info within a comment -- I think XmlDiff has an option to ignore comment nodes.
  • Dimitre Novatchev
    Dimitre Novatchev over 13 years
    @Kaeso: You can pass the file-or-http url of the 2nd document as a parameter to the transformation, Then $vDoc2 will be: document($pDoc2Url)/*
  • Admin
    Admin over 13 years
    @Kaeso: You just need to modify the patterns in the empty rule from Name|Team|Pos, to only Pos
  • Niloofar
    Niloofar over 6 years
    can you show the CalculateDifferential() method as well? what does it do?
  • Luc Morin
    Luc Morin over 3 years
    As the previous user commented, the CalculateDifferential() method would provide with a useable answer, not half of one.