pySpark Create DataFrame from RDD with Key/Value
18,243
Yes it's possible (tested with Spark 1.3.1) :
>>> rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])
>>> sqlContext.createDataFrame(rdd, ["id", "score"])
Out[2]: DataFrame[id: bigint, score: bigint]
Related videos on Youtube
Author by
theMadKing
Updated on July 13, 2022Comments
-
theMadKing almost 2 years
If I have an RDD of Key/Value (key being the column index) is it possible to load it into a dataframe? For example:
(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)
And have the dataframe look like:
1,2,18 1,10,18 2,20,18
-
Frozen Flame almost 8 yearsIs this equivolent to
rdd.toDF( ["id", "score"])
? -
Jack Daniel over 7 years'RDD' object has no attribute 'toDF' . Facing this error
-
Jack Daniel over 7 yearsI am using 1.6 spark and pyspark. Unable to load the sql.SQLContext and create DataFrame out of it.