Read excel file from S3 into Pandas DataFrame

10,229

Solution 1

It is perfectly normal! obj is a dictionnary, have u tried ?

df = pd.read_excel(obj['body'], header=2)

Solution 2

Pandas now supports s3 URL as a file path so it can read the excel file directly from s3 without downloading it first.

See here for a CSV example - https://stackoverflow.com/a/51777553/52954

Solution 3

try pd.read_excel(obj['Body'].read())

Share:
10,229
Raj
Author by

Raj

Updated on June 27, 2022

Comments

  • Raj
    Raj almost 2 years

    I have a SNS notification setup that triggers a Lambda function when a .xlsx file is uploaded to S3 bucket.

    The lambda function reads the .xlsx file into Pandas DataFrame.

    import os 
    import pandas as pd
    import json
    import xlrd
    import boto3
    
    def main(event, context):
        message = event['Records'][0]['Sns']['Message']
        parsed_message = json.loads(message)
        src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
        filepath = parsed_message['Records'][0]['s3']['object']['key']
    
        s3 = boto3.resource('s3')
        s3_client = boto3.client('s3')
    
        obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
        print(obj['Body'])
    
        df = pd.read_excel(obj, header=2)
        print(df.head(2))
    

    I get an error as below:

    Invalid file path or buffer object type: <type 'dict'>: ValueError
    Traceback (most recent call last):
    File "/var/task/handler.py", line 26, in main
    df = pd.read_excel(obj, header=2)
    File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
    File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
    File "/var/task/pandas/io/excel.py", line 307, in read_excel
    io = ExcelFile(io, engine=engine)
    File "/var/task/pandas/io/excel.py", line 376, in __init__
    io, _, _, _ = get_filepath_or_buffer(self._io)
    File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
    raise ValueError(msg.format(_type=type(filepath_or_buffer)))
    ValueError: Invalid file path or buffer object type: <type 'dict'>
    

    How can I resolve this?

  • Raj
    Raj over 5 years
    That was it. df = pd.read_excel(obj['body'], header=2). Your post is missing closing ] for 'body'. Thank you for the help.
  • Tarik Elkalai
    Tarik Elkalai over 5 years
    My pleasure :) P.S: I have added the ]