Given two directory trees, how can I find out which files differ by content?

590,571

Solution 1

Try:

diff --brief --recursive dir1/ dir2/

Or alternatively, with the short flags -qr:

diff -qr dir1/ dir2/

If you also want to see differences for files that may not exist in either directory:

diff --brief --recursive --new-file dir1/ dir2/  # with long options
diff -qrN dir1/ dir2/                            # with short flag aliases

Solution 2

The command I use is:

diff -qr dir1/ dir2/

It is exactly the same as Mark's :) But his answer bothered me as it uses different types of flags, and it made me look twice. Using Mark's more verbose flags it would be:

diff  --brief --recursive dir1/ dir2/

I apologise for posting when the other answer is perfectly acceptable. Could not stop myself... working on being less pedantic.

Solution 3

I like to use git diff --no-index dir1/ dir2/, because it can show the differences in color (if you have that option set in your git config) and because it shows all of the differences in a long paged output using "less".

Solution 4

Using rsync:

rsync --dry-run --recursive --delete --links --checksum --verbose /dir1/ /dir2/ > dirdiff_2.txt

Alternatively, using diff:

diff --brief --recursive --no-dereference --new-file --no-ignore-file-name-case /dir1 /dir2 > dirdiff_1.txt

They are functionally equivalent, but performance may vary depending on:

  • If the directories are on the same drive, rsync is faster.
  • If the directories reside on two separate drives, diff is faster.

This is because diff puts an almost equal load on both directories in parallel, maximizing load on the two drives. rsync calculates checksums in large chunks before actually comparing them. That groups the i/o operations in large chunks and leads to a more efficient processing when things take place on a single drive.

Solution 5

Meld is also a great tool for comparing two directories:

meld dir1/ dir2/

Meld has many options for comparing files or directories. If two files differ, it's easy to enter file comparison mode and see the exact differences.

Share:
590,571
Mansoor Siddiqui
Author by

Mansoor Siddiqui

Updated on May 01, 2021

Comments

  • Mansoor Siddiqui
    Mansoor Siddiqui about 3 years

    If I want find the differences between two directory trees, I usually just execute:

    diff -r dir1/ dir2/
    

    This outputs exactly what the differences are between corresponding files. I'm interested in just getting a list of corresponding files whose content differs. I assumed that this would simply be a matter of passing a command line option to diff, but I couldn't find anything on the man page.

    Any suggestions?

  • Dan Dascalescu
    Dan Dascalescu about 10 years
    Neat. Who would've guessed that git can diff arbitrary directories, not just the repo against its files?
  • Felipe Alvarez
    Felipe Alvarez about 10 years
    Perl script colordiff is very useful here, can be used with svn and normal diff.
  • sobi3ch
    sobi3ch almost 9 years
    Nice. But shorter is diff -qr dir1/ dir2/ and my extended version to diff -qr dir1/ dir2/ | grep ' differ'
  • sobi3ch
    sobi3ch almost 9 years
    ..so does it make sense tu put different answers with JUST a different flavour? IMHO no! Does it make sense tu combine both answers to one consistent answer? yes! ;)
  • sobi3ch
    sobi3ch almost 9 years
    If you comparing (like me) 2 dirs as seperate git projects/repos then you need add --no-index more on stackoverflow.com/a/1792477/473390. I've updated @alan-porter answer.
  • skv
    skv over 8 years
    @sobi3ch your version does not report files only in one directory
  • sobi3ch
    sobi3ch over 8 years
    @skv why? It's the same command as answer. I've changed only --brief to it's shortcut -q.
  • skv
    skv over 8 years
    @sobi3ch I am no expert :) i just ran it and it only told me the difference in files not files present in only one location
  • Mark Loeser
    Mark Loeser over 8 years
    @skv Not exactly what the original question asked, but updating the answer to accommodate this question as well.
  • phk
    phk over 7 years
    So that's --new-file/-N which makes diff consider missing files to be empty and --text/-a which causes it to consider all binary input to be text. I don't see the upsides for this particular use case.
  • kramer65
    kramer65 over 7 years
    Just a question; what does the q stand for? Is it an abbreviation of something? I can't find any logic behind the q..
  • FPC
    FPC over 7 years
    @kramer65 - it is the same as "--brief", but I guess you wonder why q? Perhaps for quick? "-b" is taken by "ignore changes in the amount of white space" according to the man page.
  • FPC
    FPC over 7 years
    @sobi3ch You are right, I apologise again. To my defence, I do not think I had the ability to edit the other answer at the time.
  • David Tonhofer
    David Tonhofer about 7 years
    Nice. I have written a simple perl script to perform comparison over trees but I am hitting limitations. This seems to be the ticket.
  • Gogeta70
    Gogeta70 almost 7 years
    @kramer65 I believe the q is for quiet, generally meaning less verbose.
  • Matija Nalis
    Matija Nalis almost 7 years
    rsync is not only faster for files on single drives, but also allowes for comparing files in subdirs, for example rsync --options /usr /bin /var /sbin /lib /old_root will effectively compare current root / (by specifying all subdirs in it) and /old_root (containing for example some older backup of /), which is something diff -r can't do. And if you assume that files with same size, permissions and timestamps probably have not changed, leaving out --checksum will provide you with extremely fast (if not so through) check of which files might have changed.
  • Tom Hale
    Tom Hale almost 7 years
    What is the purpose of --delete with rsync?
  • Thomas Munk
    Thomas Munk almost 7 years
    The purpose of --delete is to delete existing files in destination-dir which are not (any longer) present in source-dir
  • mata
    mata over 6 years
    In this case (with the --dry-run flag) nothing is really deleted, rsync only prints which files are in dir1 but not in dir2
  • Dave Rager
    Dave Rager about 6 years
    I'd recommend putting --dry-run first always as to not accidentally forget it.
  • Mike Maxwell
    Mike Maxwell over 5 years
    When I run this (-brief -r), I get diff: conflicting output style options diff: Try 'diff --help' for more information. The -qr method (in the answer below) works ok.
  • daboross
    daboross over 5 years
    @MikeMaxwell It needs to be --brief. -brief is interpreted as -b -r -i -e -f, in other words as a set of flags not as a single option.
  • Mike Maxwell
    Mike Maxwell over 5 years
    @daboross: wow, I've been using Unix/Linux for a l o n g time, and I never realized there was that distinction between '--' and '-'. (I don't think '--' existed when I got started.) Thanks for the explanation!
  • River Tam
    River Tam over 5 years
    @MikeMaxwell The "good" news is this is just convention. Programs can interpret them however they want. The "bad" news is, yes, this is a very common convention among almost all Unix tools. =)
  • DeanM
    DeanM over 5 years
    The two (diff and rsync) actually produce slightly different results. Consider two directory trees in which testing123/A/f1 is missing, testing456/A/B/f4 is missing, and the files /A/B/C/f9 are different. diff <flags> testing123/ testing456/ produces 3 lines stating that f9, f4, and f1 differ. rsync <flags> testing123/ testing456/ produces: deleting A/f1 A/B/f4 A/B/C/f9 At least I know that f1 is missing on the left, but I still need to see why f4 and f9 differ.
  • DeanM
    DeanM over 5 years
    The only problem is that it does not lend itself to scripting since it is a graphical app. But it is nice if you don't mind the GUI! Thanks.
  • Elijah Lynn
    Elijah Lynn over 5 years
    --brief and -q are the same option. Your statement makes it sound like they are different but they aren't.
  • Elijah Lynn
    Elijah Lynn over 5 years
    - options are called "UNIX options" and -- options are called "GNU long options" according to man ps. You should make every program accept long options if it uses any options, for this takes little extra work and helps beginners remember how to use the program. source: gnu.org/software/libc/manual/html_node/Getopt-Long-Options.h‌​tml, also google.com/search?q=gnu+long+options
  • Elijah Lynn
    Elijah Lynn over 5 years
    The comments here demonstrate why we should also use long options in our examples. Long options are mostly self-documenting. When one uses short options they explain what it does outside the code, but why not just put it in the code as a more readable example in the first place? The page for GNU Long Options even says You should make every program accept long options if it uses any options, for this takes little extra work and helps beginners remember how to use the program. source: gnu.org/software/libc/manual/html_node/Getopt-Long-Options.h‌​tml
  • Francesco Frassinelli
    Francesco Frassinelli about 5 years
    The rsync solution is very useful if you need to compare a local with a remote directory accessible over ssh
  • Tom Hale
    Tom Hale almost 5 years
    Consider adding --no-dereference.
  • Tom Hale
    Tom Hale almost 5 years
    --no-ignore-file-name-case not needed: The --no-ignore-file-name-case option cancels the effect of the --ignore-file-name-case option, reverting to the default behavior.`: from here
  • Popup
    Popup almost 5 years
    I find that meld becomes horribly sluggish if used on large directories though. Is there anything that handles large directories better?
  • Alexander
    Alexander almost 5 years
    @Popup, not that I know of. You could find differing filenames with something like this, though: find dir1 dir2 | cut -d/ -f2- | sort | uniq --unique
  • Popup
    Popup almost 5 years
    @Alexander - In that case I find that meld <(find dir1 -ls ) <(find dir2 -ls) works pretty well, using bash process substitution. (zsh's =(command) works even better.)
  • gzh
    gzh over 4 years
    I like this one, I also find that if you add --name-status to the command line, it will just show the file name list with "M/A/D" flags for Modified/Added/Deleted status.
  • Silidrone
    Silidrone about 4 years
    It happens so that both directories are actually containing the .git folder, how can I exclude it from the compare?
  • ebk
    ebk almost 4 years
    Use -x or -X to exclude specific files by shell patterns if needed.
  • Lomefin
    Lomefin over 3 years
    While it may work, using rsync is adding a layer of complexity, because now you need that dependency. It is a nice collateral, but is uses a little more than just linux in my opinion.
  • Mogens TrasherDK
    Mogens TrasherDK over 3 years
    @Lomefin I don't see how rsync is less Linux than diff. @Kickaha You definitely want a backup of your target directory, before launching that command.
  • Leonardo Dagnino
    Leonardo Dagnino over 3 years
    @ElijahLynn that's specific to ps - Unix ps has only short options, while GNU's version additionally provides long options. In general they're just called short options and long options. Although they are just a convention - find, for example, uses - for long options.