programmatically comparing word documents

12,895

Solution 1

I agree w/ Joseph about diff'ing the string. I would also recommend a purpose-built diffing engine (several found here: Any decent text diff/merge engine for .NET?) which can help you avoid some of the normal pitfalls in diffing.

Solution 2

You should use Document class to compare files and open in a Word document the result.

using OfficeWord = Microsoft.Office.Interop.Word;

object fileToOpen = (object)@"D:\doc1.docx";
string fileToCompare = @"D:\doc2.docx";

var app = Global.OfficeFile.WordApp;

object readOnly = false;
object AddToRecent = false;
object Visible = false;

OfficeWord.Document docZero = app.Documents.Open(fileToOpen, ref missing, ref readOnly, ref AddToRecent, Visible: ref Visible);

docZero.Final = false;
docZero.TrackRevisions = true;
docZero.ShowRevisions = true;
docZero.PrintRevisions = true;

//the OfficeWord.WdCompareTargetNew defines a new file, you can change this valid value to change how word will open the document
docZero.Compare(fileToCompare, missing, OfficeWord.WdCompareTarget.wdCompareTargetNew, true, false, false, false, false);

Solution 3

So my requirements were that I had to use a .Net lib and I wanted to avoid working on actual files but work with streams.

ZipArchive is in System.IO.Compressed

What I did and it worked out quite nicely was using the ZipArchive from .Net and comparing contents while skipping the .rels file because it seems the it is randomly generated on each file creation. Here's my snippet:

    private static bool AreWordFilesSame(byte[] wordA, byte[] wordB)
    {
        using (var streamA = new MemoryStream(wordA))
        using (var streamB = new MemoryStream(wordB))
        using (var zipA = new ZipArchive(streamA))
        using (var zipB = new ZipArchive(streamB))
        {
            streamA.Seek(0, SeekOrigin.Begin);
            streamB.Seek(0, SeekOrigin.Begin);

            for(int i = 0; i < zipA.Entries.Count; ++i)
            {
                Assert.AreEqual(zipA.Entries[i].Name, zipB.Entries[i].Name);

                if (zipA.Entries[i].Name.EndsWith(".rels")) //These are some weird word files with autogenerated hashes
                {
                    continue;
                }

                var streamFromA = zipA.Entries[i].Open();
                var streamFromB = zipB.Entries[i].Open();

                using (var readerA = new StreamReader(streamFromA))
                using (var readerB = new StreamReader(streamFromB))
                {
                    var bytesA = readerA.ReadToEnd();
                    var bytesB = readerB.ReadToEnd();
                    if (bytesA != bytesB || bytesA.Length == 0)
                    {
                        return false;
                    }
                }
            }

            return true;
        }
    }
Share:
12,895
user20358
Author by

user20358

Updated on July 30, 2022

Comments

  • user20358
    user20358 almost 2 years

    I need to compare two office documents, in this case two word documents and provide a difference, which is somewhat similar to what is show in SVN. Not to that extent, but at least be able to highlight the differences.

    I tried using the office COM dll and got this far..

    object fileToOpen = (object)@"D:\doc1.docx";
    string fileToCompare = @"D:\doc2.docx";
    
    WRD.Application WA = new WRD.Application();
    
    Document wordDoc = null;
    
    wordDoc = WA.Documents.Open(ref fileToOpen, Type.Missing, Type.Missing, Type.Missing, Type.Missing,      Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
    wordDoc.Compare(fileToCompare, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
    

    Any tips on how to proceed further? This will be a web application having a lot of hits. Is using the office com object the right way to go, or are there any other things I can look at?

  • user20358
    user20358 over 12 years
    everything, even if the image is different. But I am going to try and relax that requirement.
  • ditoslav
    ditoslav over 6 years
    Hi @anderson-rissardi! What does the Compare method actually do? Does it open some file somewhere? Because I'm not seeing anything when I run this in my unit test. How am I supposed to get the result since the method returns void?
  • Anderson Rissardi
    Anderson Rissardi over 6 years
    Hi @ditoslav. It opens a new file. It is the 'Copare' button inside the Word. Open the MS Word -> Tab 'Review' -> Button 'Compare'. Is the same functionality, a new document it is generate. You must to do a save of this new document.
  • echan00
    echan00 about 5 years
    do you know whether XmlPowerTools can generate a resulting document with the differences as "tracked changes"?
  • Slagmoth
    Slagmoth over 4 years
    Where did Global.OfficeFile.WordApp go? Using VS 2019 it is apparently no longer part of Office.Interop.Word.
  • Anderson Rissardi
    Anderson Rissardi over 4 years
    @Slagmoth Global.OfficeFile.WordApp its an internal variable. You should use the Microsoft.Office.Interop.Word.Application of your app