Scala Spark : How to create a RDD from a list of string and convert to DataFrame
47,993
DataFrame has schema with fixed number of columns, so it's seems not natural to make row per list of variable length. Anyway, you can create your DataFrame from RDD[Row] using existing schema, like this:
val rdd = sqlContext.sparkContext.parallelize(Seq(rowValues))
val rowRdd = rdd.map(v => Row(v: _*))
val newRow = sqlContext.createDataFrame(rdd, df.schema)
Author by
NehaM
Updated on April 22, 2020Comments
-
NehaM about 4 years
I want to create a DataFrame from a list of string that could match existing schema. Here is my code.
val rowValues = List("ann", "f", "90", "world", "23456") // fails val rowValueTuple = ("ann", "f", "90", "world", "23456") //works val newRow = sqlContext.sparkContext.parallelize(Seq(rowValueTuple)).toDF(df.columns: _*) val newdf = df.unionAll(newRow).show()
The same code fails if i use the List of String. I see the difference is with
rowValueTuple
aTuple
is created. Since the size ofrowValues
list dynamically changes, i cannot manually createTuple*
object. How can i do this? What am i missing? How can i flatten this list to meet the requirement?Appreciate your help, Please.
-
NehaM about 8 yearsThanks @vitality . I did try this. But missed something. I agree with your point. But i want to perform this for given pair of dataframe and list of row values as parameters. The number of column of dataframe and length of row values is assumed to be same.
-
Rylan over 7 yearsJust a note here, the last line should be
val newRow = sqlContext.createDataFrame(rowRdd, df.schema)
At least that's what worked for me. -
Dinosaurius almost 7 years@Rylan: What is
df
here? -
Geoffrey Anderson almost 6 yearsIf pyspark instead of scala, then what would be the map line of code?