how to compare two xml files having same data in different lines?

31,712

I had a similar problem and I eventually found: https://superuser.com/questions/79920/how-can-i-diff-two-xml-files

That post suggests doing a canonical xml sort then doing a diff. The following should work for you if you are on linux, mac, or if you have windows something like cygwin installed:

$ xmllint --c14n File1.xml > 1.xml
$ xmllint --c14n File2.xml > 2.xml
$ diff 1.xml 2.xml
Share:
31,712

Related videos on Youtube

Rui F Ribeiro
Author by

Rui F Ribeiro

Updated on September 18, 2022

Comments

  • Rui F Ribeiro
    Rui F Ribeiro almost 2 years

    I have two files have same data but in different lines.

    File 1:

    <Identities>
        <Identity>
            <Id>048206031415072010Comcast.USR8JR</Id>
            <UID>ccp_test_79</UID>
            <DisplayName>JOSH CCP</DisplayName>
            <FirstName>JOSH</FirstName>
            <LastName>CCP</LastName>
            <Role>P</Role>
            <LoginStatus>C</LoginStatus>
        </Identity>
        <Identity>
            <Id>089612381523032011Comcast.USR1JR</Id>
            <UID>94701_account1</UID>
            <DisplayName>account1</DisplayName>
            <FirstName>account1</FirstName>
            <LastName>94701</LastName>
            <Role>S</Role>
            <LoginStatus>C</LoginStatus>
        </Identity>
    </Identities>
    

    File 2 :

    <Identities>
        <Identity>
            <Id>089612381523032011Comcast.USR1JR</Id>
            <UID>94701_account1</UID>
            <DisplayName>account1</DisplayName>
            <FirstName>account1</FirstName>
            <LastName>94701</LastName>
            <Role>S</Role>
            <LoginStatus>C</LoginStatus>
        </Identity>
        <Identity>
            <Id>048206031415072010Comcast.USR8JR</Id>
            <UID>ccp_test_79</UID>
            <DisplayName>JOSH CCP</DisplayName>
            <FirstName>JOSH</FirstName>
            <LastName>CCP</LastName>
            <Role>P</Role>
            <LoginStatus>C</LoginStatus>
        </Identity>
    </Identities>
    

    If I use diff file1 file2 command I am getting below response:

    1,10d0
    <     <Identities>
    <         <Identity>
    <             <Id>048206031415072010Comcast.USR8JR</Id>
    <             <UID>ccp_test_79</UID>
    <             <DisplayName>JOSH CCP</DisplayName>
    <             <FirstName>JOSH</FirstName>
    <             <LastName>CCP</LastName>
    <             <Role>P</Role>
    <             <LoginStatus>C</LoginStatus>
    <         </Identity>
    20a11,20
    >     <Identities>
    >         <Identity>
    >             <Id>048206031415072010Comcast.USR8JR</Id>
    >             <UID>ccp_test_79</UID>
    >             <DisplayName>JOSH CCP</DisplayName>
    >             <FirstName>JOSH</FirstName>
    >             <LastName>CCP</LastName>
    >             <Role>P</Role>
    >             <LoginStatus>C</LoginStatus>
    >         </Identity>
    

    But I need to get no difference, because these files having same data in different lines.

    • Admin
      Admin over 11 years
      By sorting them linewise and comparing, you can check if they are not equal. Of course, equal after sorting does not mean that they are really equal as sorting destroys the XML syntax.
    • Admin
      Admin over 11 years
      Don't know how to solve it. they differ by order in file1 a then b and in file2 b then a. you may expose question with diff -y -B -Z -b --strip-trailing-cr file1 file2
    • Admin
      Admin over 11 years
      You could try xmldiff, but I think that will still notice the order changing, as order is relevant in generic XML. I think your best approach is to use an XML parser & generator to put each file in a canonical order and format, then use xmldiff or diff. A job for your favorite scripting language (Perl, Ruby, Python, etc.).