How to save S3 object to a file using boto3
Solution 1
There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:
s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')
# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')
# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())
These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.
Note that s3_client.download_file
won't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)
.
Solution 2
boto3 now has a nicer interface than the client:
resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)
This by itself isn't tremendously better than the client
in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.
Resources
generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.
Solution 3
For those of you who would like to simulate the set_contents_from_string
like boto2 methods, you can try
import boto3
from cStringIO import StringIO
s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)
# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())
For Python3:
In python3 both StringIO and cStringIO are gone. Use the StringIO
import like:
from io import StringIO
To support both version:
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
Solution 4
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}
import boto3
import io
s3 = boto3.resource('s3')
obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)
# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))
print(new_dict['status'])
# Should print "Error"
Solution 5
If you wish to download a version of a file, you need to use get_object
.
import boto3
bucket = 'bucketName'
prefix = 'path/to/file/'
filename = 'fileName.ext'
s3c = boto3.client('s3')
s3r = boto3.resource('s3')
if __name__ == '__main__':
for version in s3r.Bucket(bucket).object_versions.filter(Prefix=prefix + filename):
file = version.get()
version_id = file.get('VersionId')
obj = s3c.get_object(
Bucket=bucket,
Key=prefix + filename,
VersionId=version_id,
)
with open(f"{filename}.{version_id}", 'wb') as f:
for chunk in obj['Body'].iter_chunks(chunk_size=4096):
f.write(chunk)
Ref: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html
Related videos on Youtube
Vor
Updated on October 28, 2020Comments
-
Vor over 3 years
I'm trying to do a "hello world" with new boto3 client for AWS.
The use-case I have is fairly simple: get object from S3 and save it to the file.
In boto 2.X I would do it like this:
import boto key = boto.connect_s3().get_bucket('foo').get_key('foo') key.get_contents_to_filename('/tmp/foo')
In boto 3 . I can't find a clean way to do the same thing, so I'm manually iterating over the "Streaming" object:
import boto3 key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get() with open('/tmp/my-image.tar.gz', 'w') as f: chunk = key['Body'].read(1024*8) while chunk: f.write(chunk) chunk = key['Body'].read(1024*8)
or
import boto3 key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get() with open('/tmp/my-image.tar.gz', 'w') as f: for chunk in iter(lambda: key['Body'].read(4096), b''): f.write(chunk)
And it works fine. I was wondering is there any "native" boto3 function that will do the same task?
-
Rahul KP over 8 years@Daniel: Thanks for your reply. Can you reply the answer if i want to upload file using multipart upload in boto3.
-
Daniel over 8 years@RahulKumarPatle the
upload_file
method will automatically use multipart uploads for large files. -
blehman over 8 years@Daniel - Regarding multipart_upload, I created a SO question. The
upload_file
method doesn't seem to automatically use multipart upload for file sizes that exceed themultipart_threshold
configuration; at least, I haven't been able to get it to work that way. I'd love to be wrong! Any help is greatly appreciated. -
JHowIX over 8 yearsHow do you pass you credentials using this approach?
-
Daniel over 8 years@JHowIX you can either configure the credentials globally (e.g. see boto3.readthedocs.org/en/latest/guide/…) or you can pass them when creating the client. See boto3.readthedocs.org/en/latest/reference/core/… for more info on available options!
-
Vlad Nikiporoff almost 8 yearsit's beyond my understanding how
.upload_file
and.download_file
arguments order is not the same. -
jkdev over 7 years@VladNikiporoff "Upload from source to destination" "Download from source to destination"
-
jkdev over 7 yearsThat's the answer. Here's the question: "How do you save a string to an S3 object using boto3?"
-
Miles Erickson about 7 yearsNever put your AWS_ACCESS_KEY_ID or your AWS_SECRET_ACCESS_KEY in your code. These should be defined with the awscli
aws configure
command and they will be found automatically bybotocore
. -
Felix about 7 yearsfor python3 I had to use import io; fake_handl e= io.StringIO(contents)
-
SMX almost 7 yearsGreat example, and to add in since the original question asks about saving an object, the relevant method here is
my_bucket.upload_file()
(ormy_bucket.upload_fileobj()
if you have a BytesIO object). -
Asclepius over 4 yearsExactly where do the docs say that
resource
does a better job at retrying? I couldn't find any such indication. -
Dave Liu about 4 yearsDoesn't work.
NameError: name '_s3_path_split' is not defined
-
Martin Thoma about 4 years@DaveLiu Thank you for the hint; I've adjusted the code. The package should have worked before, though.
-
Marilu almost 4 yearsThis code will not download from inside and s3 folder, is there a way to do it using this way?
-
some_programmer almost 3 yearsHey @Daniel, a follow up question to the statement
s3_client.download_file won't create a directory
. What happens when this is called using API Gateway + Lambda? Where will it store the file, if I enter theFilename
parameter as/tmp/filename
?