Spark Dataframes: How can I change the order of columns in Java/Scala?

java scala dataframe spark-dataframe

10,227

Solution 1

In Scala you can use the "splat" (:_*) syntax to pass a variable length list of columns to the DataFrame.select() method.

To address your example, you can get a list of the existing columns via DataFrame.columns, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select() method:

val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)

val myNewDF = myDF.select(mySortedCols:_*)

Solution 2

One way of doing it is reordering after your join:

case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF

persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+

persons.select("age", "name").show

+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+

10,227

Author by

jest jest

Updated on June 04, 2022

Comments

jest jest almost 2 years

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?