How to load a pickle file from S3 to use in AWS Lambda?

30,140

Solution 1

As shown in the documentation for download_fileobj, you need to open the file in binary write mode and save to the file first. Once the file is downloaded, you can open it for reading and unpickle.

import pickle
import boto3

s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'wb') as data:
    s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)

with open('oldscreenurls.pkl', 'rb') as data:
    old_list = pickle.load(data)

download_fileobj takes the name of an object in S3 plus a handle to a local file, and saves the contents of that object to the file. There is also a version of this function called download_file that takes a filename instead of an open file handle and handles opening it for you.

In this case it would probably be better to use S3Client.get_object though, to avoid having to write and then immediately read a file. You could also write to an in-memory BytesIO object, which acts like a file but doesn't actually touch a disk. That would look something like this:

import pickle
import boto3
from io import BytesIO

s3 = boto3.resource('s3')
with BytesIO() as data:
    s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
    data.seek(0)    # move back to the beginning after writing
    old_list = pickle.load(data)

Solution 2

Super simple solution

import pickle
import boto3

s3 = boto3.resource('s3')
my_pickle = pickle.loads(s3.Bucket("bucket_name").Object("key_to_pickle.pickle").get()['Body'].read())

Solution 3

This is the easiest solution. You can load the data without even downloading the file locally using S3FileSystem

from s3fs.core import S3FileSystem
s3_file = S3FileSystem()

data = pickle.load(s3_file.open('{}/{}'.format(bucket_name, file_path)))
Share:
30,140

Related videos on Youtube

mifin
Author by

mifin

Updated on July 18, 2022

Comments

  • mifin
    mifin almost 2 years

    I am currently trying to load a pickled file from S3 into AWS lambda and store it to a list (the pickle is a list).

    Here is my code:

    import pickle
    import boto3
    
    s3 = boto3.resource('s3')
    with open('oldscreenurls.pkl', 'rb') as data:
        old_list = s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
    

    I get the following error even though the file exists:

    FileNotFoundError: [Errno 2] No such file or directory: 'oldscreenurls.pkl'
    

    Any ideas?

  • mifin
    mifin about 6 years
    EDIT: I don't know how to use code blocks in comments. I read this and tried the get_object route before you posted your example and the code below worked! Thanks! response = s3client.get_object(Bucket= "pythonpickles", Key= "oldscreenurls.pkl") pickled_list = response['Body'].read() old_list = pickle.loads(pickled_list)
  • Z.Wei
    Z.Wei almost 5 years
    I like this answer as it is working, simple, and straightforward.
  • Mark_Anderson
    Mark_Anderson over 4 years
    This is an excellent solution. Makes pkl on s3 almost as accessible as the pd.read_csv integration of s3 paths.