Converting dataframe columns into list of tuples

10,454
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val df1 = hiveContext.sql("select id, name from class_db.students")
scala> df1.show()
+----+-------+
|  id|   name|
+----+-------+
|1001|   John|
|1002|Michael|
+----+-------+

scala> df1.select("id", "name").rdd.map(x => (x.get(0), x.get(1))).collect()
res3: Array[(Any, Any)] = Array((1001,John), (1002,Michael))
Share:
10,454
Gsquare
Author by

Gsquare

Developer

Updated on June 07, 2022

Comments

  • Gsquare
    Gsquare about 2 years

    I have:

    val DF1 = sparkSession.sql("select col1,col2,col3 from table");
    val tupleList = DF1.select("col1","col2").rdd.map(r => (r(0),r(1))).collect()
    
    tupleList.foreach(x=> x.productIterator.foreach(println))
    

    But I do not get all the tuples in the output. Where is the issue?

    col1 col2
    AA  CCC
    AA  BBB 
    DD  CCC 
    AB  BBB 
    Others  BBB 
    GG  ALL 
    EE  ALL 
    Others  ALL 
    ALL BBB 
    NU FFF 
    NU  Others 
    Others  Others 
    C   FFF 
    

    The output I get is: CCC AA BBB AA Others AA Others DD ALL Others ALL GG ALL ALL

  • WestCoastProjects
    WestCoastProjects over 7 years
    Maybe add prose "convert the DataFrame to RDD and then map to a tuple".
  • ashley
    ashley about 4 years
    I get "invalid syntax" :(