fdupes - delete files after comparing two directories

6,820

Solution 1

Filter When Recursing with Fdupes

If you have more than one duplicate then you might end up with something like:

srv/foo                               
srv/a/b/foo
watchfolder/foo
watchfolder/c/foo
watchfolder/d/foo

In such a case, you need to feed the list of duplicates into a filter or shell script to apply some smarter rules, unless you only want to preserve the very first duplicate found (e.g. the least deeply-nested match in srv). If that's all you want, then:

fdupes --recurse --delete srv/ watchfolder/

would work. For more complex situations, such as wanting to preserve everything in srv/, consider a filter like:

fdupes --recurse srv/ watchfolder/ | sed '/^srv/d; /^$/! s/.*/"&"/' | xargs rm

Solution 2

fdupes will keep the first file, as in the file with the earliest timestamp. It is a bit misleading in what it states in the help.

$ ll foo/ bar/
bar/:
total 12
-rw-rw-r--. 1 BriGuy BriGuy   2 Jul 23 16:10 a
-rw-rw-r--. 1 BriGuy BriGuy 102 Jul 23 16:22 b
-rw-rw-r--. 1 BriGuy BriGuy 610 Jul 23 16:23 c

foo/:
total 12
-rw-rw-r--. 1 BriGuy BriGuy   2 Jul 23 16:10 a
-rw-rw-r--. 1 BriGuy BriGuy 102 Jul 23 16:11 b
-rw-rw-r--. 1 BriGuy BriGuy 610 Jul 23 16:22 c

$ fdupes foo/ bar/
foo/b                                   
bar/b

foo/c
bar/c
# in above foo/b and foo/c would be kept

$ cp bar/c foo/c
$ fdupes foo/ bar/
bar/c                                   
foo/c

foo/b
bar/b
# in above foo/b and bar/c would be kept,
# as bar/c has an earlier timestamp than foo/c now
Share:
6,820

Related videos on Youtube

Chris Terrific
Author by

Chris Terrific

Updated on September 18, 2022

Comments

  • Chris Terrific
    Chris Terrific over 1 year

    I'm currently trying to solve an issue using fdupes. I'd like to compare two folders with each other, and afterwards delete all duplicate files within one of these directories.

    Example:

    Files are being stored automatically in /srv/—lots of duplicates there. They shall be all left untouched. I also have a dir called /watchfolder/ and I want to remove all files in watchfolder if they are existent in /srv/.

    I've tried fdupes -r srv/ watchfolder/ and the other way around. But it keeps messing with my files in srv/.

  • syntaxerror
    syntaxerror over 8 years
    A good addendum to this (good) answer would also be the case when bar and foo get swapped: $fdupes bar/ foo/. Because, unlike you might think, the output is likely to be the very same as in $fdupes foo/ bar/, since timestamp is the only thing it cares for, as you correctly pointed out. This can drive you totally nuts if you want to keep one folder as-is no matter what. And since fdupes can only protect the first file (top-down), the order WILL be important. Anyways, I consider any method involving grep or sed an ugly workaround for a badly-conceived tool.
  • Bombazook
    Bombazook almost 8 years
    I wouldn't consider last example as a good practice. If watchfolder/ contains duplicates of file which copy is not contained in srv/ it leads to data loss. Be careful and make backups before that action.