Converting dataframe columns into list of tuples
10,454
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val df1 = hiveContext.sql("select id, name from class_db.students")
scala> df1.show()
+----+-------+
| id| name|
+----+-------+
|1001| John|
|1002|Michael|
+----+-------+
scala> df1.select("id", "name").rdd.map(x => (x.get(0), x.get(1))).collect()
res3: Array[(Any, Any)] = Array((1001,John), (1002,Michael))
![Gsquare](https://i.stack.imgur.com/86yEG.jpg?s=256&g=1)
Comments
-
Gsquare about 2 years
I have:
val DF1 = sparkSession.sql("select col1,col2,col3 from table"); val tupleList = DF1.select("col1","col2").rdd.map(r => (r(0),r(1))).collect() tupleList.foreach(x=> x.productIterator.foreach(println))
But I do not get all the tuples in the output. Where is the issue?
col1 col2 AA CCC AA BBB DD CCC AB BBB Others BBB GG ALL EE ALL Others ALL ALL BBB NU FFF NU Others Others Others C FFF
The output I get is:
CCC AA BBB AA Others AA Others DD ALL Others ALL GG ALL ALL
-
WestCoastProjects over 7 yearsMaybe add prose "convert the
DataFrame
toRDD
and then map to a tuple". -
ashley about 4 yearsI get "invalid syntax" :(