How to convert DataFrame to RDD in Scala?
96,860
Solution 1
Simply:
val rows: RDD[Row] = df.rdd
Solution 2
Use df.map(row => ...)
to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example
df.map(row => (row(1), row(2)))
gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.
Solution 3
I was just looking for my answer and found this post.
Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]
val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd
Related videos on Youtube
Author by
Vajra
Updated on July 09, 2022Comments
-
Vajra almost 2 years
Can someone please share how one can convert a
dataframe
to anRDD
? -
Boern almost 8 yearsif you get "type not found" for either RDD or Row this might help:
val rows: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = df.rdd
-
Ravi over 7 yearsTo extend Boern's answer, add the following two import commands: import org.apache.spark.rdd.RDD import org.apache.spark.sql.Row
-
matanster over 4 yearsWould this change anything in Spark memory holding the data, or only more lightly create a new object pointing at the same data? I hope it's the lighter of the two but not sure from the source code comments.