Passing data frame as optional function parameter in Scala
13,844
Yes you can pass dataframe
as a parameter to a function
lets say you have a dataframe
as
import sqlContext.implicits._
val df = Seq(
(1, 2, 3),
(1, 2, 3)
).toDF("col1", "col2", "col3")
which is
+----+----+----+
|col1|col2|col3|
+----+----+----+
|1 |2 |3 |
|1 |2 |3 |
+----+----+----+
you can pass it to a function as below
import org.apache.spark.sql.DataFrame
def test(sampleDF: DataFrame): DataFrame = {
sampleDF.select("col1", "col2") //doing some operation in dataframe
}
val testdf = test(df)
testdf
would be
+----+----+
|col1|col2|
+----+----+
|1 |2 |
|1 |2 |
+----+----+
Edited
As eliasah pointed out that @Garipaso wanted optional argument. This can be done by defining the function as
def test(sampleDF: DataFrame = sqlContext.emptyDataFrame): DataFrame = {
if(sampleDF.count() > 0) sampleDF.select("col1", "col2") //doing some operation in dataframe
else sqlContext.emptyDataFrame
}
If we pass a valid dataframe as
test(df).show(false)
It will give output as
+----+----+
|col1|col2|
+----+----+
|1 |2 |
|1 |2 |
+----+----+
But if we don't pass argument as
test().show(false)
we would get empty dataframe as
++
||
++
++
I hope the answer is helpful
Related videos on Youtube
Author by
Garipaso
Updated on June 04, 2022Comments
-
Garipaso almost 2 years
Is there a way that I can pass a data frame as an optional input function parameter in Scala? Ex:
def test(sampleDF: DataFrame = df.sqlContext.emptyDataFrame): DataFrame = { } df.test(sampleDF)
Though I am passing a valid data frame here , it is always assigned to an empty data frame, how can I avoid this?
-
eliasah almost 7 yearsThis shouldn't even compile. (Body of the function aside)
-
philantrovert almost 7 yearsYou have just set a default parameter for your function, if you pass a valid data frame to
test
, it should work. Why are you usingdf.test
here? What isdf
?
-
-
Peter Krauss over 4 yearsHi @RameshMaharjan, about performance and default behaviour: it is a copy or a reference? (like the classic problem of "array as parameter" is never clair when it is low or hight cost of coping reference or coping all data)