Python - How to read CSV file retrieved from S3 bucket?

17,334

Solution 1

csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.

So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is

lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)

Solution 2

To retrieve and read CSV file from s3 bucket, you can use the following code:

import csv
import boto3
from django.conf import settings

bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"

s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
                    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)

bucket = s3.Bucket(bucket_name)

obj = bucket.Object(key=file_name)

response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)

reader = csv.DictReader(lines)
for row in reader:
    # csv_header_key is the header keys which you have defined in your csv header
    print(row['csv_header_key1'], row['csv_header_key2')
Share:
17,334
Louis
Author by

Louis

Software Development Engineer.

Updated on June 09, 2022

Comments

  • Louis
    Louis almost 2 years

    There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.

    In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:

    with open(<csv_file_name>, mode='r') as file:
    reader = csv.reader(file)
    

    However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!

    Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.