Backup AWS EFS to S3

13,631

Solution 1

Actually, I think S3 Sync is what you want. Maybe setup Cron on the EC2 instances and invoke S3 Sync that way? Are you using ECS as well? I have a Cron container that does the job pretty well. For those reading who are not familiar with AWS CLI (https://aws.amazon.com/cli/) the syntax for S3 Sync is like:

aws s3 sync /path/to/source/ s3://bucket/destination/

Solution 2

  1. Back up EFS using a tool such as Attic to create a compressed, incremental, de-duplicated backup on one EC2 instance.
  2. Use S3FS or the S3 API to upload those files to S3. Personally I use a dropbox upload script, which works fine as well.

Note that Attic runs at whatever interval you specify, but keeps only the checkpoints you specify. For example you might have daily backups, but then it only keeps monthly after the first month, and yearly after the first year. Because of this it deletes files from storage. If you don't delete the files from your repository it won't hurt, but you will use more storage than required. That's why a sync of the Attic backup files might be better than a copy.

Share:
13,631
wahtye
Author by

wahtye

Updated on September 18, 2022

Comments

  • wahtye
    wahtye over 1 year

    I've been desperately trying to find a way to backup my AWS EFS file system to S3, but cannot seem to find one.

    There's several EC2 instances running all having access to the mentioned EFS. In order to reduce traffic, I already tried launching a Lambda Function, which SSHs to the EFS instances and runs "aws s3 sync ...". Unfortunately SSHing from Lambda services doesn't seem like a good production ready solution.

    What I've also tried was adapting DataPipeline, but launching additional instances just for backups seems like a hassle, too.

    Isn't there some easy way of backing up EFS to S3?
    Any suggestions appreciated.

  • wahtye
    wahtye over 7 years
    I've read that S3FS isn't stable enough for a production environment. We'll have to think about saving to s3 directly through s3 api.
  • Tim
    Tim over 7 years
    There's also s3tools.org/s3cmd-sync and a host of other options. The actual AWS commands are probably going to be more reliable though.
  • Michael - sqlbot
    Michael - sqlbot over 7 years
    @wahtye s3fs is indeed a little bit delicate. I use an older version of it in production to enable me to use S3 as the backing store for my ProFTPd server, but would never trust it for making backups. For that, I use my own code which is extremely pedantic and takes advantage of all the features S3 offers for ensuring data integrity, such as the Content-MD5 upload header -- if S3 receives an upload with a payload not matching this, it outright refuses to even store the content. A sad number of libraries and utilities seem to just not bother with this, since it is technically "optional."
  • ren.rocks
    ren.rocks almost 6 years
    What about bursting credits? the floppy-like I/O on a large efs volume will make this sync unbearable, right?
  • womble
    womble over 4 years
    Some detail as to how to configure the exact situation the question describes would, no doubt, be appreciated.