why does 'rsync --delete-before' delete files from target that still exist at the source?

9,194

I figured it out, nothing wrong with rsync. In the 2nd step where I transfer the data in parallel I was using this:

find /some/folder/structure/ -type f -mmin +60 | parallel -j4 'echo "starting `date` {}";rsync -av --no-compress --no-whole-file --quiet {} somehost.com::backup/somefolder/;echo "done `date` {}"'

that causes all files to be written into the 'somefolder' at the destination, regardless of any directory structure. Upon the next run of the script the 1st step finds files in places where they should not be, so it'll delete them. And then it'll transfer them. the 1st rsync would transfer them to the correct place but that step is only meant to delete files that don't exist and gets killed. Then the 2nd rsync runs but since it was incorrect, it would place files in the wrong place. Rinse and repeat.

The fix is to use relative paths like this:

find /some/folder/structure/ -type f -mmin +60 | sed 's/\some\/folder\/structure\/\(.*\)/\some\/folder\/structure\/.\/\1/g' | parallel -j4 'echo "starting `date` {}";rsync -av --no-compress --no-whole-file --quiet {} somehost.com::backup/somefolder/;echo "done `date` {}"'

and then the files end up in the right place. Nothing gets deleted upon the next run (unless it no longer exists) and pigs can fly after all.

Share:
9,194

Related videos on Youtube

Max
Author by

Max

Updated on September 18, 2022

Comments

  • Max
    Max almost 2 years

    I have a centOS 7.1 linux box with rsync 3.1.1. There are files there that I want to transfer to a FreeNAS 9.10 machine. For this I've set up an rsync daemon on freeNAS and transferring files works fine. But when files get deleted at the source, I want them to be deleted from the target also. So I've added --delete-before to the rsync command that I run on the Linux box. Why 'before' and not a normal delete? Because I use parallel to speed up the sync by having several rsyncs running at the same time. A paralellised rsync can't be combined with a delete because each rsync instance only sees a small part of the file-set and would be deleting lots of files if it would be combined with a delete command, possibly even deleting files other threads would have just put there. So instead I'm first running an rsync with --delete-before, kill the rsync after a couple of seconds so it has had enough time to do the deletes and then run the parallel rsync commands. This is all a bit of a hack but it should work. However, when running the rsync command with a --dry-run I can see it'll be deleting files from the target that are still existing at the source.

    This is the rsync command I'm running:

    rsync -av --delete-before --dry-run -P /some/folder/structure/ remotebackup.machine.com::backup/somefolder/
    

    The output of which is:

    building file list ...
    415 files to consider
    deleting fiFI.20150914.1317
    deleting fiFI.20150914.1316
    deleting my.20150914.1317
    ./
    bareos/
    bareos/my.20150917.1230
    bareos/prod.20150918.0530
    bareos/front01.20151101.0545
    bareos/my.20160224.1504
    bareos/fiFI.20150914.1316
    bareos/fiFI.20150914.1317
    bareos/fiFI.20150915.1311
    bareos/fiFI.20150920.1230
    bareos/fiFI.20150921.1231
    bareos/fiFI.20150922.1230
    bareos/fiFI.20151101.1230
    <snip>
    

    As you can see, rsync intends to delete some fiFI files, but later it intends to transfer those same files. That's different from what the rsync manual seems to state --delete-before should be doing (only delete when the file no longer exists at the source) and would be quite inefficient -> more data needs to be transferred.

    I have verified the files indeed still exist at the source and at the destination, so in my expectation it should just transfer the updates, not delete the target file first.

    Because of the volume of data I'm trying to transfer (5TB) and the need to parallelise this transfer (because of throughput) it is no option to run a normal delete with a non-parallel rsync. I have looked at other methods of syncing the data but came back from that. Rsync is a very robust tool and should be able to do this just fine. It's behaving different from what I expect it to do and it seems it's behaving different from what the manual says it should do.

    Is this normal behaviour? Am I doing something wrong? Why is it doing this (delete before transfer)?

    Interestingly, if I ran the initial rsync that deletes the files and sycs them and then run the same rsync again, the files get deleted again and transferred again.