Methods of max() and sum() undefined in the Java Spark Dataframe API (1.4.1)
19,394
Solution 1
The import didn't work for me. Eclipse IDE still showed the compilation error.
But the following method call worked
df.groupBy("Gender").agg(org.apache.spark.sql.functions.max(df.col("Id")), org.apache.spark.sql.functions.sum(df.col("Income")));
In case the aggregation involves only one field, we can also use the following syntax,
df.groupBy("Gender").max("Income");
Solution 2
import static org.apache.spark.sql.functions.*
Try this to import all functions including max
and sum
Solution 3
Try import org.apache.spark.sql.functions._
EDIT.
From what I've noticed you are using scala syntax, trying to acces columns via apply method. For Java, you have to pass columns like with .col
method this:
df.groupBy("department").agg(max(df.col("age")), sum(df.col("expense")));
See Java example here
Author by
Jingyu Zhang
Updated on July 29, 2022Comments
-
Jingyu Zhang almost 2 years
Putting sample code of
DataFrame.groupBy()
into my code, but it shown the methods ofmax()
andsum()
undefined.df.groupBy("department").agg(max("age"), sum("expense"));
Which Java package should I import if I want to use
max()
andsum()
method?Is the Syntax correct of this sample code?
-
Jingyu Zhang over 8 yearsThanks, but Java (1.7) does not support "_" and there not "org.apache.spark.sql.functions.*" in API.
-
TheMP over 8 yearsAnswer fixed. Try now.
-
Jingyu Zhang over 8 yearsPut "import static org.apache.spark.sql.functions.*;" for eclipse IDE.