Using RSYNC with Amazon S3

rsync amazon-s3

64,934

Solution 1

I recently stumbled across this thread on Google and it looks like the landscape has changed a bit since the question was asked. Most of the solutions suggested here are either no longer maintained or have turned commercial.

After some frustrations working with FUSE and some of the other solutions out there, I decided to write my own command-line rsync "clone" for S3 and Google Storage using Python.

You can check out the project on GitHub: http://github.com/seedifferently/boto_rsync

Another project which I was recently made aware of is "duplicity." It looks a little more elaborate and it can be found here: http://duplicity.nongnu.org/

Hope this helps.

UPDATE

The Python team at AWS has been working hard on a boto-based CLI project for their cloud services. Among the tools included is an interface for S3 which duplicates (and in many ways supersedes) most of the functionality provided by boto-rsync:

https://github.com/aws/aws-cli

In particular, the sync command can be configured to function almost exactly like rsync:

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Solution 2

I've also had good luck with S3cmd and S3sync, both of which are free.

Solution 3

Depending on how your Acronis images are created, I'm not sure any kind of rsync would save you bandwidth. Acronis images are single file(s), so rsync wouldn't be able to read inside them to only back up what changed. Also not sure what kind of server images you're creating, but since you said 100GB I'm going to assume full? An incremental image would cut down on the nightly image size greatly, thus saving bandwidth. You could also consider saving the images to an alternate location than S3, such as tape media, and store that off-site.

Solution 4

I Never tried S3rsync.

I'm using duplicity for our off-site backups. It supports incremental backups on S3 though it is not really saving bandwidth due to Amazon S3 storage protocol in which any file modification forces you to upload the whole new file again. Anyway duplicity only uploads differences from the last incremental backup.

With Duplicity you won't need to go through another server as S3sync does, nonetheless if you encrypt your data it should be worth to give S3sync a try.

Solution 5

You can try minio client aka "mc". mc provides minimal tools to work with Amazon S3 compatible cloud storage and filesystems.

mc implements the following commands

  ls        List files and folders.
  mb        Make a bucket or folder.
  cat       Display contents of a file.
  pipe      Write contents of stdin to one or more targets. When no target is specified, it writes to stdout.
  share     Generate URL for sharing.
  cp        Copy one or more objects to a target.
  mirror    Mirror folders recursively from a single source to many destinations.
  diff      Compute differences between two folders.
  rm        Remove file or bucket [WARNING: Use with care].
  access    Manage bucket access permissions.
  session   Manage saved sessions of cp and mirror operations.
  config    Manage configuration file.
  update    Check for a new software update.
  version   Print version.

You can use mirror command to do your operation. "localdir" being local directory & S3[alias for Amazon S3] and "remoteDir" name of your bucket on S3.

$ mc mirror localdir/ S3/remoteDir

You can also write a cronjob for the same. Also in case of network outrage you can anyways use "$mc session" to restart the upload from that particular time.

PS: I contribute to minio project & would love to get your feedback & contribution. Hope it helps.

View more solutions

64,934

Lior Kesos

I'm a .net developer from Camberley, Surrey. Strong interest in web applications - emphasis on back end APIs. Recently working in serverless applications. Enjoy speaking at tech events where possible.

Updated on September 17, 2022

Comments

Lior Kesos almost 2 years

I am interested in using Amazon S3 to backup our ~ 100gb server images (created via Acronis backup tools)

Obviously, this uploading to S3 every night would be expensive, in terms of bandwidth and cost. I'm considering using rsync with S3 and came across s3rsync. I was just wondering if anybody had any experience using this, or any other utility?
- dana about 13 years
  
  One thing I noticed about s3rsync is that you are currently limited to 10GB bucket sizes (check the FAQ). You can have multiple buckets, but you have to split your data into 10GB chunks.
Paul over 14 years

How do you suppose they "load" a 128Gb flash drive? I picture the world's largest usb hub, a floor to ceiling patch panel of USB connectors, 3/4 full of customer supplied flash drives, all going into the back of a single blade server.
Alan Donnelly over 13 years

No, rsync doesn't work like that. It works with any file type and doesn't need any knowledge of the internals of the file its syncing. Instead it compares hashes of chunks of the file and transfers only those chunks that differ. en.wikipedia.org/wiki/Rsync
iainlbc over 12 years

great contribution! thanks and I will give your code a shot soon. Do you have any must-reads for learning python/django? Cheers
James McMahon about 12 years

What advantages / differences does your program have compared to S3cmd and S3sync?
fnkr over 10 years

+1 for S3cmd -.-
Stanislav about 7 years

There is a s3fs fuse : github.com/s3fs-fuse/s3fs-fuse which works pretty great and can be combined with rsync however I am not sure how efficiently.
trusktr over 5 years

It would be awesome if you can explain how "the sync command can be configured to function almost exactly like rsync".
Tuxie over 5 years

S3cmd has an issue with large filenumbers (> 300k files).. It eats about 1gig per 100k files of working memory so good to keep in mind that limitation..
Oleg Belousov over 4 years

I would like to use this command to copy a filesystem entirely. which flags would I need to use for that purpose? Is it possible to tunnel the result to a `tar.gz' destination? Thanks in advance :)