How to convert a JSON file to parquet using Apache Spark?
26,515
Spark 1.4 and later
You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.
val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")
or
df.save("path/to/parquet/file", "parquet")
Check here and here for examples and more details.
Spark 1.3.1
val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")
Issue related to Windows and Spark 1.3.1
Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException
, as described here.
In that case, please consider to upgrade to a more recent Spark version.
Author by
odbhut.shei.chhele
আমাদের এই বাংলাদেশে ছিল তার বাড়ি কাউকে কিছু না বলে অভিমানে দূর দেশে দিল পারি
Updated on January 06, 2020Comments
-
odbhut.shei.chhele over 4 years
I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?
-
Rami over 8 years@eddard.stark I have updated my answer to include Spark 1.3.1
-
odbhut.shei.chhele over 8 yearsgetting a NullPointerException when I try to saveAsParquetFile
-
Rami over 8 yearsAre you trying this on Spark Shell or in some IDE?
-
odbhut.shei.chhele over 8 yearsI am using spark-shell
-
odbhut.shei.chhele over 8 yearsI am using spark-1.3.1-bin-hadoop2.6
-
Rami over 8 yearsI have just exactly tried these two lines of code on spark-1.3.1-bin-hadoop2.6 and it worked. Please check your code. and make sure you are not writing in a non-existing directory and you are correctly reading the file into the DataFrame.
-
odbhut.shei.chhele over 8 yearsI am working inside the bin folder. Is that a problem?
-
Rami over 8 yearsLet us continue this discussion in chat.