How to convert a JSON file to parquet using Apache Spark?

json apache-spark apache-spark-sql parquet

26,515

Spark 1.4 and later

You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.

val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")

df.save("path/to/parquet/file", "parquet")

Check here and here for examples and more details.

Spark 1.3.1

val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")

Issue related to Windows and Spark 1.3.1

Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException, as described here.

In that case, please consider to upgrade to a more recent Spark version.

26,515

Author by

আমাদের এই বাংলাদেশে ছিল তার বাড়ি কাউকে কিছু না বলে অভিমানে দূর দেশে দিল পারি

Updated on January 06, 2020

odbhut.shei.chhele over 4 years

I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?
Rami over 8 years

@eddard.stark I have updated my answer to include Spark 1.3.1
odbhut.shei.chhele over 8 years

getting a NullPointerException when I try to saveAsParquetFile
Rami over 8 years

Are you trying this on Spark Shell or in some IDE?
odbhut.shei.chhele over 8 years

I am using spark-shell
odbhut.shei.chhele over 8 years

I am using spark-1.3.1-bin-hadoop2.6
Rami over 8 years

I have just exactly tried these two lines of code on spark-1.3.1-bin-hadoop2.6 and it worked. Please check your code. and make sure you are not writing in a non-existing directory and you are correctly reading the file into the DataFrame.
odbhut.shei.chhele over 8 years

I am working inside the bin folder. Is that a problem?
Rami over 8 years

Let us continue this discussion in chat.