how to download S3 file in Serverless Lambda (Python)

13,820

Solution 1

If you need to download the object to the disk, you can use tempfile and download_fileobj to save it:

import tempfile

with tempfile.TemporaryFile() as f:
    s3.meta.client.download_fileobj(const.bucket_name, 
                                   'class/raw/photo/' + message['photo_name'],
                                    f)
    f.seek(0)
    # continue processing f

Note that there's a 512 MB limit on the size of temporary files in Lambda.

I would argue an even better way is to process it all in memory. Instead of tempfile, you can use io in a very similar fashion:

import io

data_stream = io.BytesIO()
s3.meta.client.download_fileobj(const.bucket_name, 
                               'class/raw/photo/' + message['photo_name'],
                                data_stream)
data_stream.seek(0)

This way, the data does not need to be written to a disk, which is a) faster and b) you can process bigger files, basically until you reach Lambda's memory limit of 3008 MB or memory.

Solution 2

In one of my project I converted webp files to jpg. I can refer to the following github link to get some understanding:

https://github.com/adjr2/webp-to-jpg/blob/master/codes.py

You can directly access the file you download in lambda function. I am not sure whether you can create a new folder or not (even I am pretty new to all this stuff) but surely you can manipulate the file and upload back to the same (or different) s3 bucket.

Hope it helps. Cheers!

Share:
13,820
Phong Vu
Author by

Phong Vu

Updated on June 12, 2022

Comments

  • Phong Vu
    Phong Vu almost 2 years

    I created a lambda in Python (using Serverless), which will be triggered by a SQS message.

    handler.py

    s3 = boto3.resource('s3')
    
    def process(event, context):
        response = None
        # for record in event['Records']:
        record = event['Records'][0]
        message = dict()
        try:
            message = json.loads(record['body'])
    
            s3.meta.client.download_file(const.bucket_name, 'class/raw/photo/' + message['photo_name'], const.raw_filepath + message['photo_name'])    
    
            ...
    
            response = {
                "statusCode": 200,
                "body": json.dumps(event)
            }
    
        except Exception as ex:
            error_msg = 'JOB_MSG: {}, EXCEPTION: {}'.format(message, ex)
            logging.error(error_msg)
    
            response = {
                    "statusCode": 500,
                    "body": json.dumps(ex)
                }
    
        return response
    

    const.py

    bucket_name = 'test'
    raw_filepath = '/var/task/raw/'
    

    I created a folder "raw", same level with the file handler.py then deploy the serverless lambda.

    I got an error (from CloudWatch) when lambda is triggered.

    No such file or directory: u'/var/task/raw/Student001.JPG.94BBBAce'
    

    As I understand, the lambda folder is not accessible or folder cannot be created in lambda.

    Just in case of best practices, I share the objectives of lambda:

    • download S3 raw file
    • resize file and upload new file to another S3 bucket

    Any suggestion is appreciated.

  • scottlittle
    scottlittle over 3 years
    For my method, I needed to "execute" data_stream with read(): package.my_method( data_stream.read() )