pySpark Create DataFrame from RDD with Key/Value

apache-spark pyspark

18,243

Yes it's possible (tested with Spark 1.3.1) :

>>> rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])
>>> sqlContext.createDataFrame(rdd, ["id", "score"])
Out[2]: DataFrame[id: bigint, score: bigint]

18,243

Related videos on Youtube

Author by

theMadKing

Updated on July 13, 2022

Comments

theMadKing almost 2 years
If I have an RDD of Key/Value (key being the column index) is it possible to load it into a dataframe? For example:
```
(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)
```
And have the dataframe look like:
```
1,2,18
1,10,18
2,20,18
```
Frozen Flame almost 8 years

Is this equivolent to rdd.toDF( ["id", "score"])?
Jack Daniel over 7 years

'RDD' object has no attribute 'toDF' . Facing this error
Jack Daniel over 7 years

I am using 1.6 spark and pyspark. Unable to load the sql.SQLContext and create DataFrame out of it.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

How to get the output from console streaming sink in Zeppelin?

PySpark: match the values of a DataFrame column against another DataFrame column

Adding a Arraylist value to a new column in Spark Dataframe using Pyspark

PySpark logging from the executor

"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8

Spark SQL(PySpark) - SparkSession import Error

How to map features from the output of a VectorAssembler back to the column names in Spark ML?

convert dataframe to libsvm format

How to check yarn logs application id

How to run a function on all Spark workers before processing data in PySpark?