select multiple elements with group by in spark.sql

16,039

Yes, it's possible and the error message you attached describes all the possibilities. You can either add the userId to groupBy:

val sqlDF = spark.sql("SELECT count(customerId),customerId,userId FROM GETBYID group by customerId, userId");

or use first():

val sqlDF = spark.sql("SELECT count(customerId),customerId,first(userId) FROM GETBYID group by customerId");
Share:
16,039
rahul
Author by

rahul

<3 coading :)

Updated on June 05, 2022

Comments

  • rahul
    rahul almost 2 years

    is there any way to group by table in sql spark which selects multiple elements code i am using:

    val df = spark.read.json("//path")
    df.createOrReplaceTempView("GETBYID")
    

    now doing group by like :

    val sqlDF = spark.sql(
      "SELECT count(customerId) FROM GETBYID group by customerId");
    

    but when I try:

    val sqlDF = spark.sql(
      "SELECT count(customerId),customerId,userId FROM GETBYID group by customerId");
    

    Spark gives an error :

    org.apache.spark.sql.AnalysisException: expression 'getbyid.userId' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

    is there any possible way to do that