Best way to compare 2 XML documents in Java

217,672

Solution 1

Sounds like a job for XMLUnit

Example:

public class SomeTest extends XMLTestCase {
  @Test
  public void test() {
    String xml1 = ...
    String xml2 = ...

    XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences

    // can also compare xml Documents, InputSources, Readers, Diffs
    assertXMLEqual(xml1, xml2);  // assertXMLEquals comes from XMLTestCase
  }
}

Solution 2

The following will check if the documents are equal using standard JDK libraries.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();

Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();

Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();

Assert.assertTrue(doc1.isEqualNode(doc2));

normalize() is there to make sure there are no cycles (there technically wouldn't be any)

The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

Solution 3

Xom has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.

This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

Solution 4

The latest version of XMLUnit can help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace() and XMLUnit.setIgnoreAttributeOrder() may be necessary to the case in question.

See working code of a simple example of XML Unit use below.

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;

public class TestXml {

    public static void main(String[] args) throws Exception {
        String result = "<abc             attr=\"value1\"                title=\"something\">            </abc>";
        // will be ok
        assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
    }

    public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
        XMLUnit.setIgnoreWhitespace(true);
        XMLUnit.setIgnoreAttributeOrder(true);

        DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));

        List<?> allDifferences = diff.getAllDifferences();
        Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
    }

}

If using Maven, add this to your pom.xml:

<dependency>
    <groupId>xmlunit</groupId>
    <artifactId>xmlunit</artifactId>
    <version>1.4</version>
</dependency>

Solution 5

Building on Tom's answer, here's an example using XMLUnit v2.

It uses these maven dependencies

    <dependency>
        <groupId>org.xmlunit</groupId>
        <artifactId>xmlunit-core</artifactId>
        <version>2.0.0</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.xmlunit</groupId>
        <artifactId>xmlunit-matchers</artifactId>
        <version>2.0.0</version>
        <scope>test</scope>
    </dependency>

..and here's the test code

import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;

public class SomeTest extends XMLTestCase {
    @Test
    public void test() {
        String result = "<root></root>";
        String expected = "<root>  </root>";

        // ignore whitespace differences
        // https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
        assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));

        assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
    }
}

The documentation that outlines this is https://github.com/xmlunit/xmlunit#comparing-two-documents

Share:
217,672

Related videos on Youtube

Mike Deck
Author by

Mike Deck

Check out my CV on Stack Overflow Careers I'm a Java developer with exposure to several other languages and platforms including .Net, Ruby/Rails, and Flex. I've worked on a variety of types of projects ranging from short term individual work to very large scale, multi-phase projects using multiple large development teams. I've developed both back end message processing systems as well as end user facing webapps. I most enjoy working on smaller teams with other talented developers in an environment where we are responsible for the entire application stack end to end. I'm lucky enough to have a great job that I enjoy leading a product development team at a small company in Dallas, but I'm always interested in hearing about other opportunities. Feel free to take a look at my CV on Stack Overflow Careers.

Updated on February 25, 2022

Comments

  • Mike Deck
    Mike Deck over 2 years

    I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.

    When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)

    I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.

    So, boiled down, the question is:

    Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

  • Mike Deck
    Mike Deck almost 16 years
    I knew something like this had to be out there. I can't believe Google didn't find it for me. Thanks.
  • skaffman
    skaffman almost 16 years
    I'e had problems with XMLUNit in the past, it's been hyper-twitchy with XML API versions and hasn't proven reliable. It's been a while since I ditched it for XOM, though, so maybe it's impoved since.
  • Pimin Konstantin Kefaloukos
    Pimin Konstantin Kefaloukos over 13 years
    Only minus is that it is not free (99€ for a pro license), with 30 day trial.
  • dma_k
    dma_k over 13 years
    I have found only the utility (altova.com/diffdog/diff-merge-tool.html); nice to have a library.
  • Kartoch
    Kartoch almost 13 years
    Not an answer but a question.
  • aberrant80
    aberrant80 over 11 years
    Quite late, but just wanted to note that this piece of code has a bug: In diffNodes(), node2 is not referenced - the second loop reuses node1 incorrectly (I edited the code to fix this). Also, it has 1 limitation: Due to the way to child maps are keyed, this diff does not support the case where element names are not unique, i.e. elements containing repeatable child elements.
  • Stew
    Stew over 11 years
    For beginners to XMLUnit, note that, by default, myDiff.similar() will return false if the control and test documents differ in indentation/newlines. I expected this behavior from myDiff.identical(), and not from myDiff.similar(). Include XMLUnit.setIgnoreWhitespace(true); in your setUp method to change the behavior for all tests in your test class, or use it in an individual test method to change the behavior for only that test.
  • koppor
    koppor over 11 years
    This perfectly works for XMLs without namespaces or with "normalized" namespace prefixes. I doubt that it works if one XML is <ns1:a xmlns:ns1="ns" /> and the other is <ns2:a xmlns:ns2="ns" />
  • Andy B
    Andy B almost 10 years
    This is perfect for folks who need to compare from a static method.
  • Jay
    Jay almost 10 years
    @Stew thanks for your comment, just starting with XMLUnit and am sure would have faced this issue. +1
  • Yngvar Kristiansen
    Yngvar Kristiansen over 8 years
    In case you are trying this with XMLUnit 2 on github, the 2 version it's a complete rewrite, so this example is for XMLUnit 1 on SourceForge. Also, the sourceforge page states "XMLUnit for Java 1.x will still be maintained".
  • Yngvar Kristiansen
    Yngvar Kristiansen over 8 years
    This example Junit 3, which uses extends TestCase, which is really old school. A Junit 4 example would be much appreciated.
  • Yngvar Kristiansen
    Yngvar Kristiansen over 8 years
    This works fine in two tests I did now, with same XML and with different XML. With IntelliJ diff it the differences in the compareds XML are easy to spot.
  • Yngvar Kristiansen
    Yngvar Kristiansen over 8 years
    By the way, you'll need this dependency if you use Maven: <dependency> <groupId>org.apache.santuario</groupId> <artifactId>xmlsec</artifactId> <version>2.0.6</version> </dependency>
  • Miklos Krivan
    Miklos Krivan almost 8 years
    dbf.setIgnoringElementContentWhitespace(true) does not have the result I would expect the <root>name</root> is not equal with <root> name </name> with this solution (padded with two space) but XMLUnit gives the equal result in this case (JDK8)
  • limonik
    limonik over 7 years
    This is perfect answer. Thank you.. However I need to ignore the nodes which are not existent. Since I do not want to see in the result output such a output: Expected presence of child node "null" but was ......How can I do that? Regards. @acdcjunior
  • Bevor
    Bevor about 7 years
    XMLUnit.setIgnoreAttributeOrder(true); does not work. If some nodes have a different order, the comparison will fail.
  • Bevor
    Bevor about 7 years
    [UPDATE] this solution works: stackoverflow.com/questions/33695041/…
  • acdcjunior
    acdcjunior about 7 years
    You do realize "IgnoreAttributeOrder" means ignore attribute order and not ignore nodes order, right?
  • Flyout91
    Flyout91 over 6 years
    For me it doesn't ignore line breaks, which is a problem.
  • Archimedes Trajano
    Archimedes Trajano over 6 years
    setIgnoringElementContentWhitespace(false)
  • user2818782
    user2818782 over 6 years
    method is assertXMLEqual as from XMLAssert.java.
  • Admin
    Admin about 6 years
    @skaffman admittedly 10 years later.... but xmlunit works very well for me in JUnit 4 tests.
  • Ben
    Ben almost 5 years
    Any context? Library reference?
  • Abhijit Bashetti
    Abhijit Bashetti almost 5 years
    @acdcjunior If the nodes are at different places... will this work?
  • acdcjunior
    acdcjunior almost 5 years
    @AbhijitBashetti Apparently only when you also add .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName)).
  • Abhijit Bashetti
    Abhijit Bashetti almost 5 years
    DiffBuilder.compare(bufferedReaderExistingFile) .withTest(bufferedReaderNewFile) .ignoreComments() .ignoreWhitespace() .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName)) .checkForSimilar() .build();
  • Abhijit Bashetti
    Abhijit Bashetti almost 5 years
    @acdcjunior: I have the above code still it says files are diffrent
  • acdcjunior
    acdcjunior almost 5 years
    @AbhijitBashetti hm... that's bad... I'm sorry, I got nothing else 😕
  • Alexander Vasiljev
    Alexander Vasiljev almost 4 years
    Yet trivial namespace prefix difference between two documents makes AssertJ fail. AssertJ is a great tool, but the job is really for XMLUnit.