foreach function not working in Spark DataFrame

12,828

Solution 1

You can cast it as Java RDD in order to use the lambda as you which:

df.toJavaRDD().foreach(x->
   System.out.println(x)
);

Solution 2

First extend scala.runtime.AbstractFunction1 and implement Serializable like below

public abstract class SerializableFunction1<T,R> 
      extends AbstractFunction1<T, R> implements Serializable 
{
}

Now use this SerializableFunction1 class like below.

df.foreach(new SerializableFunction1<Row,BoxedUnit>(){
        @Override
        public BoxedUnit apply(Row row) {
            System.out.println(row.get(0));
            return BoxedUnit.UNIT;
        }
});
Share:
12,828
user6325753
Author by

user6325753

Java Developer

Updated on June 27, 2022

Comments

  • user6325753
    user6325753 almost 2 years

    According to DataFrames API, definition is:

    public void foreach(scala.Function1<Row,scala.runtime.BoxedUnit> f)
    

    Applies a function f to all rows.

    But when I am trying like

    Dataframe df = sql.read()
        .format("com.databricks.spark.csv")
        .option("header","true")
        .load("file:///home/hadoop/Desktop/examples.csv");
    
    df.foreach(x->
    {
       System.out.println(x);
    });
    

    I am getting compile time error. any mistake?

    • C4stor
      C4stor over 7 years
      I downvoted because you need to add the error message to the question.
    • eliasah
      eliasah over 7 years
      I have downvoted this question for many reasons. First, the error message isn't available thus it falls in the category of why my code isn't working. Secondly, for the usage of foreach to print output. Third because there is a typo. DataFrame is with a big F ! I'm also voting to close the question for the matter.
    • CMR
      CMR almost 7 years
      I upvoted this question because I ended up with the same problem. There are two problems in compilation: 1. the parameter Function1<Row, BoxedUnit> does not seem to fit Java lambdas 2. if the parameter is df.foreach(new AbstractFunction1<Row, BoxedUnit>() { @Override public BoxedUnit apply(Row arg0) { return null; } });, , it works just fine.
    • CMR
      CMR almost 7 years
      The second error is The method foreach(Function1<Row,BoxedUnit>) in the type DataFrame is not applicable for the arguments ((Row x) -> {}) (or ((Row x, BoxedUnit b) -> {}))
  • 54l3d
    54l3d over 7 years
    @user6325753 can you add the error message to your question please ?
  • Jim Bob
    Jim Bob over 6 years
    new VoidFunction should be new ForeachFunction instead.
  • wandermonk
    wandermonk almost 6 years
    could you please elaborate the "BoxedUnit.UNIT". What does it means?
  • Andre
    Andre about 5 years
    Unit is the equivalent of scala to void in Java . The BoxedUnit is an internal type related to the JVM that usually shouldn't be available in the API, but as someone else put it "sometimes it leaks to the interface".