Using RSYNC with Amazon S3
Solution 1
I recently stumbled across this thread on Google and it looks like the landscape has changed a bit since the question was asked. Most of the solutions suggested here are either no longer maintained or have turned commercial.
After some frustrations working with FUSE and some of the other solutions out there, I decided to write my own command-line rsync "clone" for S3 and Google Storage using Python.
You can check out the project on GitHub: http://github.com/seedifferently/boto_rsync
Another project which I was recently made aware of is "duplicity." It looks a little more elaborate and it can be found here: http://duplicity.nongnu.org/
Hope this helps.
UPDATE
The Python team at AWS has been working hard on a boto-based CLI project for their cloud services. Among the tools included is an interface for S3 which duplicates (and in many ways supersedes) most of the functionality provided by boto-rsync:
https://github.com/aws/aws-cli
In particular, the sync
command can be configured to function almost exactly like rsync:
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Solution 2
I've also had good luck with S3cmd and S3sync, both of which are free.
Solution 3
Depending on how your Acronis images are created, I'm not sure any kind of rsync would save you bandwidth. Acronis images are single file(s), so rsync wouldn't be able to read inside them to only back up what changed. Also not sure what kind of server images you're creating, but since you said 100GB I'm going to assume full? An incremental image would cut down on the nightly image size greatly, thus saving bandwidth. You could also consider saving the images to an alternate location than S3, such as tape media, and store that off-site.
Solution 4
I Never tried S3rsync.
I'm using duplicity for our off-site backups. It supports incremental backups on S3 though it is not really saving bandwidth due to Amazon S3 storage protocol in which any file modification forces you to upload the whole new file again. Anyway duplicity only uploads differences from the last incremental backup.
With Duplicity you won't need to go through another server as S3sync does, nonetheless if you encrypt your data it should be worth to give S3sync a try.
Solution 5
You can try minio client aka "mc". mc provides minimal tools to work with Amazon S3 compatible cloud storage and filesystems.
mc implements the following commands
ls List files and folders.
mb Make a bucket or folder.
cat Display contents of a file.
pipe Write contents of stdin to one or more targets. When no target is specified, it writes to stdout.
share Generate URL for sharing.
cp Copy one or more objects to a target.
mirror Mirror folders recursively from a single source to many destinations.
diff Compute differences between two folders.
rm Remove file or bucket [WARNING: Use with care].
access Manage bucket access permissions.
session Manage saved sessions of cp and mirror operations.
config Manage configuration file.
update Check for a new software update.
version Print version.
You can use mirror command to do your operation. "localdir" being local directory & S3[alias for Amazon S3] and "remoteDir" name of your bucket on S3.
$ mc mirror localdir/ S3/remoteDir
You can also write a cronjob for the same. Also in case of network outrage you can anyways use "$mc session" to restart the upload from that particular time.
PS: I contribute to minio project & would love to get your feedback & contribution. Hope it helps.
Related videos on Youtube
Lior Kesos
I'm a .net developer from Camberley, Surrey. Strong interest in web applications - emphasis on back end APIs. Recently working in serverless applications. Enjoy speaking at tech events where possible.
Updated on September 17, 2022Comments
-
Lior Kesos almost 2 years
I am interested in using Amazon S3 to backup our ~ 100gb server images (created via Acronis backup tools)
Obviously, this uploading to S3 every night would be expensive, in terms of bandwidth and cost. I'm considering using rsync with S3 and came across s3rsync. I was just wondering if anybody had any experience using this, or any other utility?
-
dana about 13 yearsOne thing I noticed about s3rsync is that you are currently limited to 10GB bucket sizes (check the FAQ). You can have multiple buckets, but you have to split your data into 10GB chunks.
-
-
Paul over 14 yearsHow do you suppose they "load" a 128Gb flash drive? I picture the world's largest usb hub, a floor to ceiling patch panel of USB connectors, 3/4 full of customer supplied flash drives, all going into the back of a single blade server.
-
Alan Donnelly over 13 yearsNo, rsync doesn't work like that. It works with any file type and doesn't need any knowledge of the internals of the file its syncing. Instead it compares hashes of chunks of the file and transfers only those chunks that differ. en.wikipedia.org/wiki/Rsync
-
iainlbc over 12 yearsgreat contribution! thanks and I will give your code a shot soon. Do you have any must-reads for learning python/django? Cheers
-
James McMahon about 12 yearsWhat advantages / differences does your program have compared to S3cmd and S3sync?
-
fnkr over 10 years+1 for S3cmd -.-
-
Stanislav about 7 yearsThere is a s3fs fuse : github.com/s3fs-fuse/s3fs-fuse which works pretty great and can be combined with rsync however I am not sure how efficiently.
-
trusktr over 5 yearsIt would be awesome if you can explain how "the sync command can be configured to function almost exactly like rsync".
-
Tuxie over 5 yearsS3cmd has an issue with large filenumbers (> 300k files).. It eats about 1gig per 100k files of working memory so good to keep in mind that limitation..
-
Oleg Belousov over 4 yearsI would like to use this command to copy a filesystem entirely. which flags would I need to use for that purpose? Is it possible to tunnel the result to a `tar.gz' destination? Thanks in advance :)