Upacking a list to select multiple columns from a spark data frame
Solution 1
Use df.select(cols.head, cols.tail: _*)
Let me know if it works :)
The key is the method signature of select:
select(col: String, cols: String*)
The cols:String*
entry takes a variable number of arguments. :_*
unpacks arguments so that they can be handled by this argument. Very similar to unpacking in python with *args
. See here and here for other examples.
Solution 2
You can typecast String to spark column like this:
import org.apache.spark.sql.functions._
df.select(cols.map(col): _*)
Solution 3
Another option that I've just learnt.
import org.apache.spark.sql.functions.col
val columns = Seq[String]("col1", "col2", "col3")
val colNames = columns.map(name => col(name))
val df = df.select(colNames:_*)
Solution 4
First convert the String Array to a List of Spark dataset Column type as below
String[] strColNameArray = new String[]{"a", "b", "c", "d"};
List<Column> colNames = new ArrayList<>();
for(String strColName : strColNameArray){
colNames.add(new Column(strColName));
}
then convert the List using JavaConversions functions within the select statement as below. You need the following import statement.
import scala.collection.JavaConversions;
Dataset<Row> selectedDF = df.select(JavaConversions.asScalaBuffer(colNames ));
Solution 5
Yes , You can make use of .select in scala.
Use .head and .tail to select the whole values mentioned in the List()
Example
val cols = List("b", "c")
df.select(cols.head,cols.tail: _*)
Ben
Updated on March 27, 2020Comments
-
Ben about 4 years
I have a spark data frame
df
. Is there a way of sub selecting a few columns using a list of these columns?scala> df.columns res0: Array[String] = Array("a", "b", "c", "d")
I know I can do something like
df.select("b", "c")
. But suppose I have a list containing a few column namesval cols = List("b", "c")
, is there a way to pass this to df.select?df.select(cols)
throws an error. Something likedf.select(*cols)
as in python -
Ben over 8 yearsThanks! Worked like a charm. Could explain a bit more about the syntax? Specifically what does
col.tail: _ *
do? -
Ben over 8 yearsI think I understand now. The key is the method signature of select
select(col: String, cols: String*)
. Thecols:String*
entry takes a variable number of arguments.:_*
unpacks arguments so that they can be handled by this argument. Very similar to unpacking in python with*args
. See here and here for other examples. -
Shagun Sodhani over 8 yearsCool! You got it right :) Sorry I got both the notifications just now so couldn't reply earlier. :)
-
Ben over 8 yearsNo problem. Thanks again!
-
MaxU - stop genocide of UA over 6 yearsWhat about a bit shorter version:
df.select(cols.map(df(_)): _*)
? -
user1326784 about 4 yearsCan you please share how to do the same(pass the column names) in java while doing dataframeResult = inpDataframe.select("col1","col2",....)
-
Olfa2 about 2 yearscan you elaborate plz ?