Passing data frame as optional function parameter in Scala

13,844

Yes you can pass dataframe as a parameter to a function

lets say you have a dataframe as

import sqlContext.implicits._

val df = Seq(
  (1, 2, 3),
  (1, 2, 3)
).toDF("col1", "col2", "col3")

which is

+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |3   |
|1   |2   |3   |
+----+----+----+

you can pass it to a function as below

import org.apache.spark.sql.DataFrame
def test(sampleDF: DataFrame): DataFrame = {
  sampleDF.select("col1", "col2") //doing some operation in dataframe
}

val testdf = test(df)

testdf would be

+----+----+
|col1|col2|
+----+----+
|1   |2   |
|1   |2   |
+----+----+

Edited

As eliasah pointed out that @Garipaso wanted optional argument. This can be done by defining the function as

def test(sampleDF: DataFrame = sqlContext.emptyDataFrame): DataFrame = {
  if(sampleDF.count() > 0) sampleDF.select("col1", "col2") //doing some operation in dataframe
  else sqlContext.emptyDataFrame  
}

If we pass a valid dataframe as

test(df).show(false)

It will give output as

+----+----+
|col1|col2|
+----+----+
|1   |2   |
|1   |2   |
+----+----+

But if we don't pass argument as

test().show(false)

we would get empty dataframe as

++
||
++
++

I hope the answer is helpful

Share:
13,844

Related videos on Youtube

Garipaso
Author by

Garipaso

Updated on June 04, 2022

Comments

  • Garipaso
    Garipaso almost 2 years

    Is there a way that I can pass a data frame as an optional input function parameter in Scala? Ex:

    def test(sampleDF: DataFrame = df.sqlContext.emptyDataFrame): DataFrame = {
    
    
    }
    
    
    df.test(sampleDF)
    

    Though I am passing a valid data frame here , it is always assigned to an empty data frame, how can I avoid this?

    • eliasah
      eliasah almost 7 years
      This shouldn't even compile. (Body of the function aside)
    • philantrovert
      philantrovert almost 7 years
      You have just set a default parameter for your function, if you pass a valid data frame to test, it should work. Why are you using df.test here? What is df?
  • Peter Krauss
    Peter Krauss over 4 years
    Hi @RameshMaharjan, about performance and default behaviour: it is a copy or a reference? (like the classic problem of "array as parameter" is never clair when it is low or hight cost of coping reference or coping all data)