How to convert DataFrame to RDD in Scala?

96,860

Solution 1

Simply:

val rows: RDD[Row] = df.rdd

Solution 2

Use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example

df.map(row => (row(1), row(2)))

gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.

Solution 3

I was just looking for my answer and found this post.

Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]

val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd
Share:
96,860

Related videos on Youtube

Vajra
Author by

Vajra

Updated on July 09, 2022

Comments

  • Vajra
    Vajra almost 2 years

    Can someone please share how one can convert a dataframe to an RDD?

  • Boern
    Boern almost 8 years
    if you get "type not found" for either RDD or Row this might help: val rows: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = df.rdd
  • Ravi
    Ravi over 7 years
    To extend Boern's answer, add the following two import commands: import org.apache.spark.rdd.RDD import org.apache.spark.sql.Row
  • matanster
    matanster over 4 years
    Would this change anything in Spark memory holding the data, or only more lightly create a new object pointing at the same data? I hope it's the lighter of the two but not sure from the source code comments.