fdupes - delete files after comparing two directories
Solution 1
Filter When Recursing with Fdupes
If you have more than one duplicate then you might end up with something like:
srv/foo
srv/a/b/foo
watchfolder/foo
watchfolder/c/foo
watchfolder/d/foo
In such a case, you need to feed the list of duplicates into a filter or shell script to apply some smarter rules, unless you only want to preserve the very first duplicate found (e.g. the least deeply-nested match in srv). If that's all you want, then:
fdupes --recurse --delete srv/ watchfolder/
would work. For more complex situations, such as wanting to preserve everything in srv/, consider a filter like:
fdupes --recurse srv/ watchfolder/ | sed '/^srv/d; /^$/! s/.*/"&"/' | xargs rm
Solution 2
fdupes will keep the first file, as in the file with the earliest timestamp. It is a bit misleading in what it states in the help.
$ ll foo/ bar/
bar/:
total 12
-rw-rw-r--. 1 BriGuy BriGuy 2 Jul 23 16:10 a
-rw-rw-r--. 1 BriGuy BriGuy 102 Jul 23 16:22 b
-rw-rw-r--. 1 BriGuy BriGuy 610 Jul 23 16:23 c
foo/:
total 12
-rw-rw-r--. 1 BriGuy BriGuy 2 Jul 23 16:10 a
-rw-rw-r--. 1 BriGuy BriGuy 102 Jul 23 16:11 b
-rw-rw-r--. 1 BriGuy BriGuy 610 Jul 23 16:22 c
$ fdupes foo/ bar/
foo/b
bar/b
foo/c
bar/c
# in above foo/b and foo/c would be kept
$ cp bar/c foo/c
$ fdupes foo/ bar/
bar/c
foo/c
foo/b
bar/b
# in above foo/b and bar/c would be kept,
# as bar/c has an earlier timestamp than foo/c now
Related videos on Youtube
Chris Terrific
Updated on September 18, 2022Comments
-
Chris Terrific over 1 year
I'm currently trying to solve an issue using
fdupes
. I'd like to compare two folders with each other, and afterwards delete all duplicate files within one of these directories.Example:
Files are being stored automatically in
/srv/
—lots of duplicates there. They shall be all left untouched. I also have a dir called/watchfolder/
and I want to remove all files inwatchfolder
if they are existent in/srv/
.I've tried
fdupes -r srv/ watchfolder/
and the other way around. But it keeps messing with my files insrv/
. -
syntaxerror over 8 yearsA good addendum to this (good) answer would also be the case when
bar
andfoo
get swapped:$fdupes bar/ foo/
. Because, unlike you might think, the output is likely to be the very same as in$fdupes foo/ bar/
, since timestamp is the only thing it cares for, as you correctly pointed out. This can drive you totally nuts if you want to keep one folder as-is no matter what. And since fdupes can only protect the first file (top-down), the order WILL be important. Anyways, I consider any method involvinggrep
orsed
an ugly workaround for a badly-conceived tool. -
Bombazook almost 8 yearsI wouldn't consider last example as a good practice. If watchfolder/ contains duplicates of file which copy is not contained in srv/ it leads to data loss. Be careful and make backups before that action.