Reading data from S3 using Lambda

83,703

Solution 1

You can use bucket.objects.all() to get a list of the all objects in the bucket (you also have alternative methods like filter, page_sizeand limit depending on your need)

These methods return an iterator with S3.ObjectSummary objects in it, from there you can use the method object.get to retrieve the file.

Solution 2

s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')
Share:
83,703
LearningSlowly
Author by

LearningSlowly

PhD Student Civil engineer now lost in the world of computers.

Updated on July 09, 2022

Comments

  • LearningSlowly
    LearningSlowly almost 2 years

    I have a range of json files stored in an S3 bucket on AWS.

    I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.

    I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).

    Each json file contains a list, simple consisting of results = [content]

    In pseudo-code what I want is:

    1. Connect to the S3 bucket (jsondata)
    2. Read the contents of the JSON file (results)
    3. Execute my script for this data (results)

    I can list the buckets I have by:

    import boto3
    
    s3 = boto3.resource('s3')
    
    for bucket in s3.buckets.all():
        print(bucket.name)
    

    Giving:

    jsondata
    

    But I cannot access this bucket to read its results.

    There doesn't appear to be a read or load function.

    I wish for something like

    for bucket in s3.buckets.all():
       print(bucket.contents)
    

    EDIT

    I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.

    From here it seems that you must give lambda a download path, from which it can access the files itself

    import libraries
    
    s3_client = boto3.client('s3')
    
    def function to be executed:
       blah blah
    
    def handler(event, context):
        for record in event['Records']:
            bucket = record['s3']['bucket']['name']
            key = record['s3']['object']['key'] 
            download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
            s3_client.download_file(bucket, key, download_path)
    
  • ScottMcC
    ScottMcC over 6 years
    Should also note that you need to create an s3 object to use in your response. i.e. s3 = boto3.client('s3')