How to save dataframe to pickle file using Pyspark
12,199
saveAsPickleFile
is a method of RDD
and not of a data frame.
see this documentation: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pickle
So you can just call:
df.rdd.saveAsPickleFile(filename)
To load it from file, run:
pickleRdd = sc.pickleFile(filename).collect()
df2 = spark.createDataFrame(pickleRdd)
![adil blanco](https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=256)
Author by
adil blanco
Updated on July 02, 2022Comments
-
adil blanco about 2 years
I have to save a dataframe to Pickle file, but it returns an error
df.saveAsPickleFile(path)
AttributeError: 'Dataframe' object has no attribute 'saveAsPickleFile'
-
adil blanco over 6 yearsThank you for your answer, so if I understand it is necessary to convert my dataframe to RDD then save it to Pickle?
-
Meike almost 3 yearsyes and no. yes, the rdd step is necessary, because it is an rdd method. No, it is not a conversion. rdd is the type that lies one abstraction layer below dataFrame. so there is no cost for 'converting'