How to update metadata of an existing object in AWS S3 using python boto3?

22,304

Solution 1

It can be done using the copy_from() method -

import boto3

s3 = boto3.resource('s3')
s3_object = s3.Object('bucket-name', 'key')
s3_object.metadata.update({'id':'value'})
s3_object.copy_from(CopySource={'Bucket':'bucket-name', 'Key':'key'}, Metadata=s3_object.metadata, MetadataDirective='REPLACE')

Solution 2

You can do this using copy_from() on the resource (like this answer) mentions, but you can also use the client's copy_object() and specify the same source and destination. The methods are equivalent and invoke the same code underneath.

import boto3
s3 = boto3.client("s3")
src_key = "my-key"
src_bucket = "my-bucket"
s3.copy_object(Key=src_key, Bucket=src_bucket,
               CopySource={"Bucket": src_bucket, "Key": src_key},
               Metadata={"my_new_key": "my_new_val"},
               MetadataDirective="REPLACE")

The 'REPLACE' value specifies that the metadata passed in the request should overwrite the source metadata entirely. If you mean to only add new key-values, or delete only some keys, you'd have to first read the original data, edit it and call the update.

To replacing only a subset of the metadata correctly:

  1. Retrieve the original metadata with head_object(Key=src_key, Bucket=src_bucket). Also take note of the Etag in the response
  2. Make desired changes to the metadata locally.
  3. Call copy_object as above to upload the new metadata, but pass CopySourceIfMatch=original_etag in the request to ensure the remote object has the metadata you expect before overwriting it. original_etag is the one you got in step 1. In case the metadata (or the data itself) has changed since head_object was called (e.g. by another program running simultaneously), copy_object will fail with an HTTP 412 error.

Reference: boto3 issue 389

Solution 3

Similar to this answer but with the existing Metadata preserved while modifying only what is needed. From the system defined meta data, I've only preserved ContentType and ContentDisposition in this example. Other system defined meta data can also be preserved similarly.

import boto3

s3 = boto3.client('s3')
response = s3.head_object(Bucket=bucket_name, Key=object_name)
response['Metadata']['new_meta_key'] = "new_value"
response['Metadata']['existing_meta_key'] = "new_value"
result = s3.copy_object(Bucket=bucket_name, Key=object_name,
                        CopySource={'Bucket': bucket_name,
                                    'Key': object_name},
                        Metadata=response['Metadata'],
                        MetadataDirective='REPLACE', TaggingDirective='COPY',
                        ContentDisposition=response['ContentDisposition'],
                        ContentType=response['ContentType'])

Solution 4

You can either update metadata by adding something or updating a current metadata value with a new one, here is the piece of code I am using :

import sys
import os 
import boto3
import pprint
from boto3 import client
from botocore.utils import fix_s3_host
param_1= YOUR_ACCESS_KEY
param_2= YOUR_SECRETE_KEY
param_3= YOUR_END_POINT 
param_4= YOUR_BUCKET

#Create the S3 client
s3ressource = client(
    service_name='s3', 
    endpoint_url= param_3,
    aws_access_key_id= param_1,
    aws_secret_access_key=param_2,
    use_ssl=True,
    )
# Building a list of of object per bucket
def BuildObjectListPerBucket (variablebucket):
    global listofObjectstobeanalyzed
    listofObjectstobeanalyzed = []
    extensions = ['.jpg','.png']
    for key  in s3ressource.list_objects(Bucket=variablebucket)["Contents"]:
        #print (key ['Key'])
        onemoreObject=key['Key']
        if onemoreObject.endswith(tuple(extensions)):
            listofObjectstobeanalyzed.append(onemoreObject)
    #print listofObjectstobeanalyzed
        else :
            s3ressource.delete_object(Bucket=variablebucket,Key=onemoreObject)          
    return listofObjectstobeanalyzed

# for a given existing object, create metadata
def createmetdata(bucketname,objectname):
    s3ressource.upload_file(objectname, bucketname, objectname, ExtraArgs={"Metadata": {"metadata1":"ImageName","metadata2":"ImagePROPERTIES" ,"metadata3":"ImageCREATIONDATE"}})

# for a given existing object, add new metadata
def ADDmetadata(bucketname,objectname):
    s3_object = s3ressource.get_object(Bucket=bucketname, Key=objectname)
    k = s3ressource.head_object(Bucket = bucketname, Key = objectname)
    m = k["Metadata"]
    m["new_metadata"] = "ImageNEWMETADATA"
    s3ressource.copy_object(Bucket = bucketname, Key = objectname, CopySource = bucketname + '/' + objectname, Metadata = m, MetadataDirective='REPLACE')

# for a given existing object, update  a metadata with new value
def CHANGEmetadata(bucketname,objectname):
    s3_object = s3ressource.get_object(Bucket=bucketname, Key=objectname)
    k = s3ressource.head_object(Bucket = bucketname, Key = objectname)
    m = k["Metadata"]
    m.update({'watson_visual_rec_dic':'ImageCREATIONDATEEEEEEEEEEEEEEEEEEEEEEEEEE'})
    s3ressource.copy_object(Bucket = bucketname, Key = objectname, CopySource = bucketname + '/' + objectname, Metadata = m, MetadataDirective='REPLACE')

def readmetadata (bucketname,objectname):
    ALLDATAOFOBJECT = s3ressource.get_object(Bucket=bucketname, Key=objectname)
    ALLDATAOFOBJECTMETADATA=ALLDATAOFOBJECT['Metadata']
    print ALLDATAOFOBJECTMETADATA



# create the list of object on a per bucket basis
BuildObjectListPerBucket (param_4)

# Call functions to see the results 
for objectitem in listofObjectstobeanalyzed:
    # CALL The function you want 
    readmetadata(param_4,objectitem)
    ADDmetadata(param_4,objectitem)
    readmetadata(param_4,objectitem)
    CHANGEmetadata(param_4,objectitem)
    readmetadata(param_4,objectitem)
Share:
22,304

Related videos on Youtube

arc000
Author by

arc000

Updated on July 09, 2022

Comments

  • arc000
    arc000 almost 2 years

    boto3 documentation does not clearly specify how to update the user metadata of an already existing S3 Object.

  • Philip Colmer
    Philip Colmer almost 4 years
    This doesn't seem to work entirely correctly if there is some non-user metadata on the object, e.g. Content-Type. The metadata dict only contains user metadata but, when you call copy_object with REPLACE, the non-user metadata is lost. It is necessary to explicitly "reset" those entries with the additional parameters to copy_object.
  • jckuester
    jckuester almost 4 years
    How does this command behave performance-wise? Is the file really copied or is the API smart enough to just update the metadata in place?
  • Chins Kuriakose
    Chins Kuriakose over 3 years
    Hey @jckuester , did you get any answer for this question?
  • jckuester
    jckuester over 3 years
    @ChinsKuriakose the AWS docs say " You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata."(docs.aws.amazon.com/AmazonS3/latest/dev/UsingMeta‌​data.html). So, metadata is basically immutable and therefore the object is really copied when updating it.
  • Prateek Naik
    Prateek Naik almost 3 years
    Can we update the system-defined object? Tried the solution but it is going as user-defined. It would be very helpful if you provide any help.