How to read a Json file with a specific format with Spark Scala?

11,101

If you want to use read.json you need a single JSON document per line. If your file contains a valid JSON array with documents it simply won't work as expected. For example if we take your example data input file should be formatted like this:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}

If you use read.json on above structure you'll see it is parsed as expected:

scala> sqlContext.read.json("namefile").printSchema
root
 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)
Share:
11,101
SparkUser
Author by

SparkUser

Updated on June 04, 2022

Comments

  • SparkUser
    SparkUser almost 2 years

    I'm trying to read a Json file which is like :

    [ 
    {"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
    ,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} 
    ,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} 
    ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} 
    ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} 
    ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} 
    ]} 
    ,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
    ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} 
    ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} 
    ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} 
    ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
    ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} 
    ]} 
    ,...
    ] 
    

    I've tried the command:

        val df = sqlContext.read.json("namefile") 
        df.show() 
    

    But this does not work : my columns are not recognized...

    • vijay kumar
      vijay kumar almost 9 years
      is your single json data is spread into multiple lines or single line ?
    • SparkUser
      SparkUser almost 9 years
      single line but when I open it with notePad this is easy to read and on several lines.
  • Jyd
    Jyd over 8 years
    @zero323 I want o do something similar, but I just want to load the data from the json file, to an rdd and I dont want to go through the spark-sql just a simple transformation. How can I achieve this? Thanks!