How to save dataframe to pickle file using Pyspark

12,199

saveAsPickleFile is a method of RDD and not of a data frame.

see this documentation: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=pickle

So you can just call:

df.rdd.saveAsPickleFile(filename)

To load it from file, run:

pickleRdd = sc.pickleFile(filename).collect()
df2 = spark.createDataFrame(pickleRdd)
Share:
12,199
adil blanco
Author by

adil blanco

Updated on July 02, 2022

Comments

  • adil blanco
    adil blanco about 2 years

    I have to save a dataframe to Pickle file, but it returns an error

    df.saveAsPickleFile(path)
    

    AttributeError: 'Dataframe' object has no attribute 'saveAsPickleFile'

  • adil blanco
    adil blanco over 6 years
    Thank you for your answer, so if I understand it is necessary to convert my dataframe to RDD then save it to Pickle?
  • Meike
    Meike almost 3 years
    yes and no. yes, the rdd step is necessary, because it is an rdd method. No, it is not a conversion. rdd is the type that lies one abstraction layer below dataFrame. so there is no cost for 'converting'