Delete duplicate files with rsync

216

Solution 1

rsync (checked with version 3.0.9) has an option called --remove-source-files which does what it says. If you only want to delete the transferred files and not transfer additional files which have not yet been transferred, you need to additionally use the option `--existing``.

Unfortunately it seems that rsync doesn't output which files it is deleting even if options --verbose --itemize-changes --stats are used.

Example

# create source and target dirs
mkdir /tmp/source
mkdir /tmp/target
# create a test file in source
touch /tmp/source/test
# rsync source and target
rsync --archive --itemize-changes --verbose --stats /tmp/source/ /tmp/target
# verify that test has been copied to target
[ -f /tmp/target/test ] && echo "Found" || echo "Not found"
# create another file in source
touch /tmp/source/test2
# delete files on source which are already existing on target
rsync --archive --itemize-changes --verbose --stats --remove-source-files --existing /tmp/source/ /tmp/target
# verify that test has been deleted on source
[ -f /tmp/source/test ] && echo "Found" || echo "Not found"
# verify that test2 still exists on source and was not transferred to target
[ -f /tmp/source/test2 ] && echo "Found" || echo "Not found"
[ -f /tmp/target/test2 ] && echo "Found" || echo "Not found"

Solution 2

As written before, rsync will not delete from the source, only on the destination.

In your case, I would generate MD5 hashes of the files on the mirror server, then check on the primary server if the hashes are correct and remove those files.

I.e.:

mirror$ find . -type f -print0 | xargs -0 md5sum > mirror.md5

..transfer mirror.md5 to primary server...

primary$ md5sum -c mirror.md5

Check for any FAILED files, then remove the files that have been transfered succesfully. You could automate it like this:

md5sum -c mirror.md5 | grep 'OK$' | sed -e 's/: OK$//' | while read FILE; do rm "$FILE"; done

This will filter all files with a good hash, chop off the 'OK' part from md5sum and remove the files one by one.

Needless to say, after this you don't want to use the --delete option from rsync to transfer the second half of your files...

Share:
216

Related videos on Youtube

devrim deri
Author by

devrim deri

Updated on September 18, 2022

Comments

  • devrim deri
    devrim deri almost 2 years

    hey i am trying to use put method like the code below. i need have a body like this

    {
      "lines": [
        {
          "lineId": 300198921,
          "quantity": 1
        }],
      "params": {
      },
      "status": "Picking"
    }
    

    i run the request on talend API there is no problem but in c# always getting an error

    public static void TrendyolDurumGüncelle()
            {
                client = new RestClient("https://api.trendyol.com/sapigw/suppliers/{supplierId}/shipment-packages/{Id}");
                client.Authenticator = new HttpBasicAuthenticator(TrendyolUserName, TrendyolPassword);
    
                request = new RestRequest("https://api.trendyol.com/sapigw/suppliers/{supplierId}/shipment-packages/{Id}", Method.PUT);
                request.AddUrlSegment("supplierId", TrendyolMerchantId);
                request.AddUrlSegment("Id", "173657633");
    
    
    
                var body = new { lines = new {lineId = "300198921", quantity = 3 } , @params = new { } ,status = "Picking" }; 
    
                request.AddJsonBody(body); 
    
                var response = client.Execute(request);
                var content = response.Content;
    
    
    
            }
    
    • Paul R
      Paul R almost 11 years
      Just make sure you do a "dry run" with the -n option first. Also backups are a good idea.
    • twalberg
      twalberg almost 11 years
      I don't believe that rsync will "delete from source after transfer". You could write a script that transfers batches of files and then deletes after verifying successful transfer, though.
    • Guru Stron
      Guru Stron about 4 years
      What error are you getting?