Create DataFrame from case class

12,772

Solution 1

To be able to use implicit conversion to DataFrame you have to import spark.implicits._ like :

val spark = SparkSession
      .builder
      .appName("test")
      .getOrCreate()

import spark.implicits._

This way the conversion should work.

In case you are using Spark Shell this is not needed as the Spark session is already created and the specific conversion functions imported.

Solution 2

There is no issue in the piece of code you copied from the link shared, as error explains it's something else (exact code copy result in my run below).

case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)
val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
Employee_DataFrame.show()

Employee_DataFrame:org.apache.spark.sql.DataFrame = [Name: string, Age: integer ... 3 more fields]'

+----+---+-----------------+------+-------+
|Name|Age|      Designation|Salary|ZipCode|
+----+---+-----------------+------+-------+
|Anto| 21|Software Engineer|  2000|  56798|
+----+---+-----------------+------+-------+
Share:
12,772

Related videos on Youtube

Jill Clover
Author by

Jill Clover

Updated on June 04, 2022

Comments

  • Jill Clover
    Jill Clover almost 2 years

    I have read other related questions but I do not find the answer.

    I want to create a DataFrame from a case class in Spark 2.3. Scala 2.11.8.

    Code

    package org.XXX
    
    import org.apache.spark.sql.SparkSession
    
    object Test {
    
      def main(args: Array[String]): Unit = {
        val spark = SparkSession
          .builder
          .appName("test")
          .getOrCreate()
    
        case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)
        val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
        val Employee_DataFrame = EmployeesData.toDF
    
        spark.stop()
      }
    }
    

    Here is what I tried in spark-shell:

    case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)
    val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
    val Employee_DataFrame = EmployeesData.toDF
    

    Error

    java.lang.VerifyError: class org.apache.spark.sql.hive.HiveExternalCatalog overrides final method alterDatabase.(Lorg/apache/spark/sql/catalyst/catalog/CatalogDatabase;)V
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
      at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
      at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
      at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
      at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:53)
      at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
      at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
      at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
      at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
      at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
      at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
      at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
      at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
      at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
      at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
      at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
      at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
      at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
      at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:470)
      at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377)
      at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228)
    
    • gasparms
      gasparms almost 6 years
      Code is OK, where do you try? spark-shell? IDE? Could you add information about how do you initialize SparkSession. What scala version do you have?
    • Midiparse
      Midiparse almost 6 years
      yes, the error that you are getting seems unrelated to me
  • Mátyás Kuti-Kreszács
    Mátyás Kuti-Kreszács over 4 years
    However the error suggests that you mixed up some dependencies. Check the version of your dependencies and if they are compatible with each other.