Writing a pickle file to an s3 bucket in AWS
24,694
Solution 1
I've found the solution, need to call BytesIO into the buffer for pickle files instead of StringIO (which are for CSV files).
import io
import boto3
pickle_buffer = io.BytesIO()
s3_resource = boto3.resource('s3')
new_df.to_pickle(pickle_buffer)
s3_resource.Object(bucket, key).put(Body=pickle_buffer.getvalue())
Solution 2
Further to you answer, you don't need to convert to csv. pickle.dumps method returns a byte obj. see here: https://docs.python.org/3/library/pickle.html
import boto3
import pickle
bucket='your_bucket_name'
key='your_pickle_filename.pkl'
pickle_byte_obj = pickle.dumps([var1, var2, ..., varn])
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket,key).put(Body=pickle_byte_obj)
Solution 3
this worked for me with pandas 0.23.4 and boto3 1.7.80 :
bucket='your_bucket_name'
key='your_pickle_filename.pkl'
new_df.to_pickle(key)
s3_resource.Object(bucket, key).put(Body=open(key, 'rb'))
Solution 4
This solution (using s3fs) worked perfectly and elegantly for my team:
import s3fs
from pickle import dump
fs = s3fs.S3FileSystem(anon=False)
bucket = 'bucket1'
key = 'your_pickle_filename.pkl'
dump(data, fs.open(f's3://{bucket}/{key}', 'wb'))
Related videos on Youtube
Comments
-
himi64 almost 3 years
I'm trying to write a pandas dataframe as a pickle file into an s3 bucket in AWS. I know that I can write dataframe
new_df
as a csv to an s3 bucket as follows:bucket='mybucket' key='path' csv_buffer = StringIO() s3_resource = boto3.resource('s3') new_df.to_csv(csv_buffer, index=False) s3_resource.Object(bucket,path).put(Body=csv_buffer.getvalue())
I've tried using the same code as above with
to_pickle()
but with no success. -
Sip over 5 yearsDo you have a suggestion, how to use this with a pandas-Dataframe? i tried pickle_byte_obj = df.to_pickle(None).encode() but it doesn't seem to work
-
whs2k about 5 years
import s3fs
and then you candf.to_csv('s3://bucket/path/fn.csv')
-
Falc over 4 yearsI get the error:
ValueError: Unrecognized compression type: infer
when using this code -
TheProletariat almost 4 yearsI get the error
ValueError: I/O operation on closed file.
-
TheProletariat almost 4 yearsI think you mean to put
key
instead ofpath
, so that it reads:s3_resource.Object(bucket, key).put(Body=open(key, 'rb'))
, right? Also, this worked for me and did not throw aI/O operation on a closed file
error once I replaced 'path'. Thanks! -
Mehrad Eslami over 3 yearsif you are getting ValueError: infer, change the compression to None. it's set by default to infer. df.to_pickle(buffer_pickle, compression=None)
-
Med Zamrik over 3 yearsusing this method caused
ValueError: I/O operation on closed file.
error for me as well, I usedbuffer = pickle.dumps(df)
and then used buffer as the Body for s3 put