pySpark Create DataFrame from RDD with Key/Value

18,243

Yes it's possible (tested with Spark 1.3.1) :

>>> rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])
>>> sqlContext.createDataFrame(rdd, ["id", "score"])
Out[2]: DataFrame[id: bigint, score: bigint]
Share:
18,243

Related videos on Youtube

theMadKing
Author by

theMadKing

Updated on July 13, 2022

Comments

  • theMadKing
    theMadKing almost 2 years

    If I have an RDD of Key/Value (key being the column index) is it possible to load it into a dataframe? For example:

    (0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)
    

    And have the dataframe look like:

    1,2,18
    1,10,18
    2,20,18
    
  • Frozen Flame
    Frozen Flame almost 8 years
    Is this equivolent to rdd.toDF( ["id", "score"])?
  • Jack Daniel
    Jack Daniel over 7 years
    'RDD' object has no attribute 'toDF' . Facing this error
  • Jack Daniel
    Jack Daniel over 7 years
    I am using 1.6 spark and pyspark. Unable to load the sql.SQLContext and create DataFrame out of it.