I'm trying to use "aws s3 sync" on my EC2 instance. Is the '--exclude' option broken?

11,644

Solution 1

The mentioned github issue is closed and the sync command does allow exclude now.

I've tried and adding --exclude ".git/*" works.

Note that the ending /* is required.

Solution 2

It's broken for me too - and there's already an open issue for this https://github.com/aws/aws-cli/issues/434. It is a bug.

Share:
11,644

Related videos on Youtube

user158845
Author by

user158845

Updated on September 18, 2022

Comments

  • user158845
    user158845 over 1 year

    I'm trying to backup my EC2 instance to S3 using Amazon's official tools that come preinstalled on the EC2 instance. I'm having difficulty getting the sync command to exclude directories from the sync. The documentation makes it sound like it's easy using the '--exclude' option. However, I've been unable to get it work.

    I'm running the latest version available through yum:

    [root@HOSTNAME ~]# aws --version
    aws-cli/1.1.1 Python/2.6.8 Linux/3.4.57-48.42.amzn1.x86_64
    

    This is the command I'm having trouble with:

    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '*.git/*'
    

    I want to exclude all folders named '.git' and all files in those folders from the sync.

    Judging from the documentation The pattern I gave to '--exclude' should work. However, the entire .git directory and all of its files are still synced. Here's an example line of output:

    upload: ../var/www/site/.git/objects/7b/e3cdf203d34a0d7eff30a96a78d20eacee8d77 to s3://backup-bucket/var/www/site/.git/objects/7b/e3cdf203d34a0d7eff30a96a78d20eacee8d77
    

    I tried the following commands, which all failed to exclude folders named '.git':

    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude *.git/*
    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude */.git/*
    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '.git'
    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '*\.git/*'
    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '.*\.git/.*'
    

    And, perhaps most disturbingly, this command doesn't exclude anything from syncing:

    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '*'
    

    In fact, it appears that '--exclude' only works if the globbing is on the right side of an absolute path. For example, this excludes the .git folder at /var/www/site/.git:

    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '/var/www/site/.git/*'
    

    but this doesn't exclude anything:

    aws s3 sync /var/www s3://backup-bucket/var/www/ --exclude '*/www/site/.git/*'
    

    I have many sites in /var/www and many folders that I want to exclude from syncing such as '.git', 'parsed', 'cache', etc. It would be terrible to have to provide absolute paths to all of them.

    I'm hoping that there's something simple that I'm doing wrong here. Thank you all for your help.