How to convert DataFrame to Json?

80,771

Solution 1

val result: DataFrame = sqlContext.read.json(path)
result.write.json("/yourPath")

The method write is in the class DataFrameWriter and should be accessible to you on DataFrame objects. Just make sure that your rdd is of type DataFrame and not of deprecated type SchemaRdd. You can explicitly provide type definition val data: DataFrame or cast to dataFrame with toDF().

Solution 2

If you have a DataFrame there is an API to convert back to an RDD[String] that contains the json records.

val df = Seq((2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")
df.toJSON.saveAsTextFile("/tmp/jsonRecords")
df.toJSON.take(2).foreach(println)

This should be available from Spark 1.4 onward. Call the API on the result DataFrame you created.

The APIs available are listed here

Solution 3

sqlContext.read().json(dataFrame.toJSON())

Solution 4

When you run your spark job as
--master local --deploy-mode client
Then,
df.write.json('path/to/file/data.json') works.

If you run on cluster [on header node], [--master yarn --deploy-mode cluster] better approach is to write data to aws s3 or azure blob and read from it.

df.write.json('s3://bucket/path/to/file/data.json') works.

Share:
80,771

Related videos on Youtube

ashish.garg
Author by

ashish.garg

Student at IIITB M.Tech (Data Science).

Updated on July 31, 2022

Comments

  • ashish.garg
    ashish.garg almost 2 years

    I have a huge JSON file, a small part from it as follows:

    {
        "socialNews": [{
            "adminTagIds": "",
            "fileIds": "",
            "departmentTagIds": "",
            ........
            ........
            "comments": [{
                "commentId": "",
                "newsId": "",
                "entityId": "",
                ....
                ....
            }]
        }]
        .....
        }
    

    I have applied lateral view explode on socialNews as follows:

    val rdd = sqlContext.jsonFile("file:///home/ashish/test")
    rdd.registerTempTable("social")
    val result = sqlContext.sql("select * from social LATERAL VIEW explode(socialNews) social AS comment")
    

    Now I want to convert back this result (DataFrame) to JSON and save into a file, but I am not able to find any Scala API to do the conversion. Is there any standard library to do this or some way to figure it out?

  • kerwin
    kerwin over 8 years
    I got could-not-find-class exception
  • MrChristine
    MrChristine over 8 years
    What version of Spark and distro are you using?