Methods of max() and sum() undefined in the Java Spark Dataframe API (1.4.1)

19,394

Solution 1

The import didn't work for me. Eclipse IDE still showed the compilation error.

But the following method call worked

df.groupBy("Gender").agg(org.apache.spark.sql.functions.max(df.col("Id")), org.apache.spark.sql.functions.sum(df.col("Income")));

In case the aggregation involves only one field, we can also use the following syntax,

df.groupBy("Gender").max("Income");

Solution 2

import static org.apache.spark.sql.functions.* 

Try this to import all functions including max and sum

Solution 3

Try import org.apache.spark.sql.functions._

EDIT.

From what I've noticed you are using scala syntax, trying to acces columns via apply method. For Java, you have to pass columns like with .col method this:

df.groupBy("department").agg(max(df.col("age")), sum(df.col("expense")));

See Java example here

Share:
19,394
Jingyu Zhang
Author by

Jingyu Zhang

Updated on July 29, 2022

Comments

  • Jingyu Zhang
    Jingyu Zhang almost 2 years

    Putting sample code of DataFrame.groupBy() into my code, but it shown the methods of max() and sum() undefined.

    df.groupBy("department").agg(max("age"), sum("expense"));
    

    Which Java package should I import if I want to use max() and sum() method?

    Is the Syntax correct of this sample code?

  • Jingyu Zhang
    Jingyu Zhang over 8 years
    Thanks, but Java (1.7) does not support "_" and there not "org.apache.spark.sql.functions.*" in API.
  • TheMP
    TheMP over 8 years
    Answer fixed. Try now.
  • Jingyu Zhang
    Jingyu Zhang over 8 years
    Put "import static org.apache.spark.sql.functions.*;" for eclipse IDE.