Mercurial and Word or PDF documents

10,101

Solution 1

Yes, but of course you won't be able to diff in any meaningful way. The files will therefore be treated as binary during merges.

Mercurial is perfectly capable of tracking binary files:

Mercurial generally makes no assumptions about file contents. Thus, most things in Mercurial work fine with any type of file.

Mercurial stores a binary diff regardless of the file type. The problem with PDF/Word files is that a little change to them usually causes a huge difference in their binary representation on disk. .docx Documents are stored as a zipped xml, due to the zipping a single flipped bit inside the archive can cause the zip archive to look completely different.

If you don't grow your repository too large, you probably won't experience any issues using Mercurial.

Solution 2

Yes.

You will be able to do meaningful diffs for MS Word documents.

  • If you have TortoiseHg installed and you have set up a repository, right-click the file for which you want to check the diffs.

  • On the context menu, click TortoiseHg > Visual Diffs.

  • In the Visual Diffs dialog, select docdiff, instead of kdiff3.

  • Double-click the file in the Visual Diffs dialog.

MS Word will open a Compare Result Word document, which will show the differences between the current version of the document and the previous version as Tracked Changes.

Solution 3

Beware the suggested

cmd.pdfdiff = [\path\to\diffpdf.exe]
opts.pdfdiff= -a $local $other

$local and $other have no meaning in the extdiff context. The literal strings "$local" and "$other", not the file names, will be passed to "diffpdf.exe". I found this the hard way.

cmd.pdfdiff = [\path\to\diffpdf.exe]
opts.pdfdiff= -a

will work and the two files will be passed as parameters following the "-a". c.f. https://www.mercurial-scm.org/wiki/ExtdiffExtension where it is stated:

Each custom diff commands can have two parts: a 'cmd' and an 'opts' part. The cmd.xxx option defines the name of an executable program that will be run, and opts.xxx defines a set of command-line options which will be inserted to the command between the program name and the files/directories to diff

Solution 4

For Pdf files, I was able to get the GPL licensed DiffPDF hooked up to do comparisons between revisions of pdf files.

I added the following to my mercurial.ini file:

[extdiff]
cmd.pdfdiff = [\path\to\diffpdf.exe]
opts.pdfdiff= -a $local $other

[diff-patterns]
**.pdf=pdfdiff

Now when I click on the pdf file in tortoisehg (or use hg pdfdiff at the cmd line), it opens the two files for comparison. Since my pdf's tend to contain images, I use the appearance comparer (-a in opts). If you have mostly text, you can use -w instead.

It defaults to highlighting to show the diffs. I prefer the Src Xor Dest option for displaying the differences, but I don't think there is a cmd line option for that.

Share:
10,101
andrew0007
Author by

andrew0007

Updated on June 11, 2022

Comments

  • andrew0007
    andrew0007 about 2 years

    is it possible to use Mercurial version control to track Word or PDF files? Is there any limitation or problem?