How to change permission recursively to folder with AWS s3 or AWS s3api

21,218

Solution 1

You will need to run the command individually for every object.

You might be able to short-cut the process by using:

aws s3 cp --acl bucket-owner-full-control --metadata Key=Value --profile <original_account_profile> s3://bucket/path s3://bucket/path

That is, you copy the files to themselves, but with the added ACL that grants permissions to the bucket owner.

If you have sub-directories, then add --recursive.

Solution 2

This can be only be achieved with using pipes. Try -

aws s3 ls s3://bucket/path/ --recursive | awk '{cmd="aws s3api put-object-acl --acl bucket-owner-full-control --bucket bucket --key "$4; system(cmd)}'

Solution 3

The other answers are ok, but the FASTEST way to do this is to use the aws s3 cp command with the option --metadata-directive REPLACE, like this:

aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/folder s3://bucket/folder --metadata-directive REPLACE

This gives speeds of between 50Mib/s and 80Mib/s.

The answer from the comments from John R, which suggested to use a 'dummy' option, like --storage-class STANDARD. Whilst this works, only gave me copy speeds between 5Mib/s and 11mb/s.

The inspiration for trying this came from AWS's support article on the subject: https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/

NOTE: If you encounter 'access denied` for some of your objects, this is likely because you are using AWS creds for the bucket owning account, whereas you need to use creds for the account where the files were copied from.

Solution 4

use python to set up the permissions recursively

#!/usr/bin/env python
import boto3
import sys

client = boto3.client('s3')
BUCKET='enter-bucket-name'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            print(obj['Key'])
            set_acl(obj['Key'])
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue

    print "failures :", failures

def set_acl(key):
  client.put_object_acl(     
    GrantFullControl="id=%s" % get_account_canonical_id,
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

Solution 5

I had a similar issue with taking ownership of log objects in a quite large bucket. Total number of objects - 3,290,956 Total size 1.4 TB.

The solutions I was able to find were far too sluggish for that amount of objects. I ended up writing some code that was able to do the job several times faster than

aws s3 cp

You will need to install requirements:

pip install pathos boto3 click

#!/usr/bin/env python3
import logging
import os
import sys
import boto3
import botocore
import click
from time import time
from botocore.config import Config
from pathos.pools import ThreadPool as Pool

logger = logging.getLogger(__name__)

streamformater = logging.Formatter("[*] %(levelname)s: %(asctime)s: %(message)s")
logstreamhandler = logging.StreamHandler()
logstreamhandler.setFormatter(streamformater)


def _set_log_level(ctx, param, value):
    if value:
        ctx.ensure_object(dict)
        ctx.obj["log_level"] = value
        logger.setLevel(value)
        if value <= 20:
            logger.info(f"Logger set to {logging.getLevelName(logger.getEffectiveLevel())}")
    return value


@click.group(chain=False)
@click.version_option(version='0.1.0')
@click.pass_context
def cli(ctx):
    """
        Take object ownership of S3 bucket objects.
    """
    ctx.ensure_object(dict)
    ctx.obj["aws_config"] = Config(
        retries={
            'max_attempts': 10,
            'mode': 'standard'
        }
    )


@cli.command("own")
@click.argument("bucket", type=click.STRING)
@click.argument("prefix", type=click.STRING, default="/")
@click.option("--profile", type=click.STRING, default="default", envvar="AWS_DEFAULT_PROFILE", help="Configuration profile from ~/.aws/{credentials,config}")
@click.option("--region", type=click.STRING, default="us-east-1", envvar="AWS_DEFAULT_REGION", help="AWS region")
@click.option("--threads", "-t", type=click.INT, default=40, help="Threads to use")
@click.option("--loglevel", "log_level", hidden=True, flag_value=logging.INFO, callback=_set_log_level, expose_value=False, is_eager=True, default=True)
@click.option("--verbose", "-v", "log_level", flag_value=logging.DEBUG, callback=_set_log_level, expose_value=False, is_eager=True, help="Increase log_level")
@click.pass_context
def command_own(ctx, *args, **kwargs):
    ctx.obj.update(kwargs)
    profile_name = ctx.obj.get("profile")
    region = ctx.obj.get("region")
    bucket = ctx.obj.get("bucket")
    prefix = ctx.obj.get("prefix").lstrip("/")
    threads = ctx.obj.get("threads")
    pool = Pool(nodes=threads)
    logger.addHandler(logstreamhandler)
    logger.info(f"Getting ownership of all objects in s3://{bucket}/{prefix}")
    start = time()

    try:
        SESSION: boto3.Session = boto3.session.Session(profile_name=profile_name)
    except botocore.exceptions.ProfileNotFound as e:
        logger.warning(f"Profile {profile_name} was not found.")
        logger.warning(f"Falling back to environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN")
        AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID", "")
        AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY", "")
        AWS_SESSION_TOKEN = os.environ.get("AWS_SESSION_TOKEN", "")
        if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:
            if AWS_SESSION_TOKEN:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                                                               aws_session_token=AWS_SESSION_TOKEN)
            else:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
        else:
            logger.error("Unable to find AWS credentials.")
            sys.exit(1)

    s3c = SESSION.client('s3', config=ctx.obj["aws_config"])

    def bucket_keys(Bucket, Prefix='', StartAfter='', Delimiter='/'):
        Prefix = Prefix[1:] if Prefix.startswith(Delimiter) else Prefix
        if not StartAfter:
            del StartAfter
            if Prefix.endswith(Delimiter):
                StartAfter = Prefix
        del Delimiter
        for page in s3c.get_paginator('list_objects_v2').paginate(Bucket=Bucket, Prefix=Prefix):
            for content in page.get('Contents', ()):
                yield content['Key']

    def worker(key):
        logger.info(f"Processing: {key}")
        s3c.copy_object(Bucket=bucket, Key=key,
                        CopySource={'Bucket': bucket, 'Key': key},
                        ACL='bucket-owner-full-control',
                        StorageClass="STANDARD"
                        )

    object_keys = bucket_keys(bucket, prefix)
    pool.map(worker, object_keys)
    end = time()
    logger.info(f"Completed for {end - start:.2f} seconds.")


if __name__ == '__main__':
    cli()

Usage:

get_object_ownership.py own -v my-big-aws-logs-bucket /prefix

The bucket mentioned above was processed for ~7 hours using 40 threads.

[*] INFO: 2021-08-05 19:53:55,542: Completed for 25320.45 seconds.

Some more speed comparison using AWS cli vs this tool on the same subset of data:

aws s3 cp --recursive --acl bucket-owner-full-control --metadata-directive 53.59s user 7.24s system 20% cpu 5:02.42 total

vs

[*] INFO: 2021-08-06 09:07:43,506: Completed for 49.09 seconds.

Share:
21,218
gc5
Author by

gc5

Updated on April 07, 2022

Comments

  • gc5
    gc5 about 2 years

    I am trying to grant permissions to an existing account in s3.

    The bucket is owned by the account, but the data was copied from another account's bucket.

    When I try to grant permissions with the command:

    aws s3api put-object-acl --bucket <bucket_name> --key <folder_name> --profile <original_account_profile> --grant-full-control emailaddress=<destination_account_email>
    

    I receive the error:

    An error occurred (NoSuchKey) when calling the PutObjectAcl operation: The specified key does not exist.
    

    while if I do it on a single file the command is successful.

    How can I make it work for a full folder?

    • Bui Anh Tuan
      Bui Anh Tuan over 6 years
      ObjectACL just support files and bucket, not support folder. So you cannot define ACL for folder. The simplest solution that you define ACL for bucket level. Example: "Resource": "arn:aws:s3:::BUCKET_NAME/*"
  • Jaffadog
    Jaffadog about 6 years
    Copying a file to itself with fail with "This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes"; but add another dummy change like setting the storage class, and you are good to go: <code>aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/path s3://bucket/path --storage-class STANDARD</code>
  • Mabyn
    Mabyn about 5 years
    This helped me but I had to adapt it. One of my main changes was to replace GrantFullControl="id=%s" % get_account_canonical_id with ACL='bucket-owner-full-control'. This was needed because I wanted to change ACLs for objects in a bucket in a different AWS account.
  • woodyiii
    woodyiii about 5 years
    @Jaffadog has the most concise answer and worked for me. exactly right about the --storage-class as dummy attribute too. thx!
  • Trevor Reid
    Trevor Reid almost 5 years
    Can you elaborate on how and why this code is more efficient and compared to what?
  • Homme Zwaagstra
    Homme Zwaagstra almost 5 years
    Can you also elaborate on why you've just copied a previous answer?
  • sskular
    sskular over 4 years
    Have in mind that if you have a large number of files it might take a while. I didn't time it, but it took be over an hour for cca 1.6k files
  • im7mortal
    im7mortal almost 3 years
    I am agree with @sskular. If you have a lot of object and you need it ASAP as it's often happen then prefer the force copy command from stackoverflow.com/a/63804619/2908138