Delete all versions of an object in S3 using python?

18,372

Solution 1

As a supplement to @jarmod's answer, here is a way I developed a workaround to "hard deleting" an object (with delete markered objects included);

def get_all_versions(bucket, filename):
    s3 = boto3.client('s3')
    keys = ["Versions", "DeleteMarkers"]
    results = []
    for k in keys:
        response = s3.list_object_versions(Bucket=bucket)[k]
        to_delete = [r["VersionId"] for r in response if r["Key"] == filename]
    results.extend(to_delete)
    return results

bucket = "YOUR BUCKET NAME"
file = "YOUR FILE"

for version in get_all_versions(bucket, file):
    s3.delete_object(Bucket=bucket, Key=file, VersionId=version)

Solution 2

The other answers delete objects individually. It is more efficient to use the delete_objects boto3 call and batch process your delete. See the code below for a function which collects all objects and deletes in batches of 1000:

bucket = 'bucket-name'
s3_client = boto3.client('s3')
object_response_paginator = s3_client.get_paginator('list_object_versions')

delete_marker_list = []
version_list = []

for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
    if 'DeleteMarkers' in object_response_itr:
        for delete_marker in object_response_itr['DeleteMarkers']:
            delete_marker_list.append({'Key': delete_marker['Key'], 'VersionId': delete_marker['VersionId']})

    if 'Versions' in object_response_itr:
        for version in object_response_itr['Versions']:
            version_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

for i in range(0, len(delete_marker_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': delete_marker_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)

for i in range(0, len(version_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': version_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)

Solution 3

I had trouble using the other solutions to this question so here's mine.

import boto3
bucket = "bucket name goes here"
filename = "filename goes here"

client = boto3.client('s3')
paginator = client.get_paginator('list_object_versions')
response_iterator = paginator.paginate(Bucket=bucket)
for response in response_iterator:
    versions = response.get('Versions', [])
    versions.extend(response.get('DeleteMarkers', []))
    for version_id in [x['VersionId'] for x in versions
                       if x['Key'] == filename and x['VersionId'] != 'null']:
        print('Deleting {} version {}'.format(filename, version_id))
        client.delete_object(Bucket=bucket, Key=filename, VersionId=version_id)

This code deals with the cases where

  • object versioning isn't actually turned on
  • there are DeleteMarkers
  • there are no DeleteMarkers
  • there are more versions of a given file than fit in a single API response

Mahesh Mogal's answer doesn't delete DeleteMarkers. Mangohero1's answer fails if the object is missing a DeleteMarker. Hari's answer repeats 10 times (to workaround missing pagination logic).

Solution 4

The documentation is helpful here:

  1. When versioning is enabled in an S3 bucket, a simple DeleteObject request cannot permanently delete an object from that bucket. Instead, Amazon S3 inserts a delete marker (which is effectively a new version of the object with its own version ID).
  2. When you try to GET an object whose current version is a delete marker, S3 behaves as if the object has been deleted (even though it has not) and returns a 404 error.
  3. To permanently delete an object from a versioned bucket, use DeleteObject, with the relevant version ID, for each and every version of the object (and that includes the delete markers).

Solution 5

You can use object_versions.

def delete_all_versions(bucket_name: str, prefix: str):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    if prefix is None:
        bucket.object_versions.delete()
    else:
        bucket.object_versions.filter(Prefix=prefix).delete()

delete_all_versions("my_bucket", None) # empties the entire bucket
delete_all_versions("my_bucket", "my_prefix/") # deletes all objects matching the prefix (can be only one if only one matches)
Share:
18,372

Related videos on Youtube

rooscous
Author by

rooscous

I like to code and learn new things.

Updated on September 18, 2022

Comments

  • rooscous
    rooscous over 1 year

    I have a versioned bucket and would like to delete the object (and all of its versions) from the bucket. However, when I try to delete the object from the console, S3 simply adds a delete marker but does not perform a hard delete.

    Is it possible to delete all versions of the object (hard delete) with a particular key?:

    s3resource = boto3.resource('s3')
    bucket = s3resource.Bucket('my_bucket')
    obj = bucket.Object('my_object_key')
    
    # I would like to delete all versions for the object like so:
    obj.delete_all_versions()
    
    # or delete all versions for all objects like so:
    bucket.objects.delete_all_versions()
    
    • rooscous
      rooscous over 6 years
      ok, I will have to test this. It has an optional VersionId parameter, so that makes me think that if I do not explicitly provide the version id for each object, it will just perform a soft delete (delete marker only).
    • jarmod
      jarmod over 6 years
      I highly recommend the documentation: docs.aws.amazon.com/AmazonS3/latest/dev/….
  • rooscous
    rooscous over 6 years
    Hi jarmod. Thanks for the response -- I did read the documentation, but I wanted to know if this is really the only way to do it (i.e. get a list of all versions of the object and delete each one iteratively).
  • gene_wood
    gene_wood over 5 years
    This solution doesn't delete DeleteMarkers. I added a solution here that accounts for the case where a file does or doesn't have a DeleteMarker
  • gene_wood
    gene_wood about 5 years
    Excellent, I didn't know about delete_objects. Out of curiosity, why do you build the two lists, delete_marker_list and version_list independently and then iterate over them independently? Would it work to just build one list combining all versions and delete markers, then iterate over that single list in steps of 1000?
  • AndrewC
    AndrewC about 5 years
    Just for clarity.
  • San
    San almost 5 years
    This should really be upvoted more as it sheds light on the actual workings through documentation.
  • Evgeny Goldin
    Evgeny Goldin about 4 years
    Thanks a lot! Used it to nuke all not latest versions (not version['IsLatest']) after disabling bucket versioning.
  • Emma Y
    Emma Y almost 4 years
    This is really great example . Thank you