Convert Pandas Dataframe to nested JSON

32,152

UPDATE:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'])
       .apply(lambda x: x[['timestamp','tide']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'Tide-Data'})
       .to_json(orient='records'))
     

Result (formatted):

In [103]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  {
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": [
      {
        "tide": 6905.0,
        "timestamp": "1807-01-01"
      },
      {
        "tide": 6931.0,
        "timestamp": "1807-02-01"
      },
      {
        "tide": 6896.0,
        "timestamp": "1807-03-01"
      },
      {
        "tide": 6953.0,
        "timestamp": "1807-04-01"
      },
      {
        "tide": 7043.0,
        "timestamp": "1807-05-01"
      }
    ]
  },
  {
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": [
      {
        "tide": 7093.0,
        "timestamp": "1843-01-01"
      },
      {
        "tide": 6688.0,
        "timestamp": "1843-02-01"
      },
      {
        "tide": 6493.0,
        "timestamp": "1843-03-01"
      },
      {
        "tide": 6723.0,
        "timestamp": "1843-04-01"
      },
      {
        "tide": 6533.0,
        "timestamp": "1843-05-01"
      }
    ]
  },
  {
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": [
      {
        "tide": 6957.0,
        "timestamp": "1848-07-01"
      },
      {
        "tide": 6944.0,
        "timestamp": "1848-08-01"
      },
      {
        "tide": 7084.0,
        "timestamp": "1848-09-01"
      },
      {
        "tide": 6898.0,
        "timestamp": "1848-10-01"
      },
      {
        "tide": 6859.0,
        "timestamp": "1848-11-01"
      }
    ]
  },
  {
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": [
      {
        "tide": 6880.0,
        "timestamp": "1848-02-01"
      },
      {
        "tide": 6700.0,
        "timestamp": "1848-03-01"
      },
      {
        "tide": 6775.0,
        "timestamp": "1848-04-01"
      },
      {
        "tide": 6580.0,
        "timestamp": "1848-05-01"
      },
      {
        "tide": 6685.0,
        "timestamp": "1848-06-01"
      }
    ]
  },
  {
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": [
      {
        "tide": 6909.0,
        "timestamp": "1854-07-01"
      },
      {
        "tide": 6940.0,
        "timestamp": "1854-08-01"
      },
      {
        "tide": 6961.0,
        "timestamp": "1854-09-01"
      },
      {
        "tide": 6952.0,
        "timestamp": "1854-10-01"
      },
      {
        "tide": 6952.0,
        "timestamp": "1854-11-01"
      }
    ]
  }
]

OLD answer:

You can do it using groupby(), apply() and to_json() methods:

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
       .apply(lambda x: dict(zip(x.timestamp,x.tide)))
       .reset_index()
       .rename(columns={0:'Tide-Data'})
       .to_json(orient='records'))

Output:

In [112]: print(json.dumps(json.loads(j), indent=2, sort_keys=True))
[
  {
    "Country": "FRA",
    "ID": 1,
    "Latitude": 48.383,
    "Location": "BREST",
    "Longitude": -4.495,
    "Tide-Data": {
      "1807-01-01": 6905.0,
      "1807-02-01": 6931.0,
      "1807-03-01": 6896.0,
      "1807-04-01": 6953.0,
      "1807-05-01": 7043.0
    }
  },
  {
    "Country": "DEU",
    "ID": 7,
    "Latitude": 53.867,
    "Location": "CUXHAVEN 2",
    "Longitude": 8.717,
    "Tide-Data": {
      "1843-01-01": 7093.0,
      "1843-02-01": 6688.0,
      "1843-03-01": 6493.0,
      "1843-04-01": 6723.0,
      "1843-05-01": 6533.0
    }
  },
  {
    "Country": "DEU",
    "ID": 8,
    "Latitude": 53.899,
    "Location": "WISMAR 2",
    "Longitude": 11.458,
    "Tide-Data": {
      "1848-07-01": 6957.0,
      "1848-08-01": 6944.0,
      "1848-09-01": 7084.0,
      "1848-10-01": 6898.0,
      "1848-11-01": 6859.0
    }
  },
  {
    "Country": "NLD",
    "ID": 9,
    "Latitude": 51.918,
    "Location": "MAASSLUIS",
    "Longitude": 4.25,
    "Tide-Data": {
      "1848-02-01": 6880.0,
      "1848-03-01": 6700.0,
      "1848-04-01": 6775.0,
      "1848-05-01": 6580.0,
      "1848-06-01": 6685.0
    }
  },
  {
    "Country": "USA",
    "ID": 10,
    "Latitude": 37.807,
    "Location": "SAN FRANCISCO",
    "Longitude": -122.465,
    "Tide-Data": {
      "1854-07-01": 6909.0,
      "1854-08-01": 6940.0,
      "1854-09-01": 6961.0,
      "1854-10-01": 6952.0,
      "1854-11-01": 6952.0
    }
  }
]

PS if you don't care of idents you can write directly to JSON file:

(df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False)
   .apply(lambda x: dict(zip(x.timestamp,x.tide)))
   .reset_index()
   .rename(columns={0:'Tide-Data'})
   .to_json('/path/to/file_name.json', orient='records'))
Share:
32,152
Felix
Author by

Felix

I am a journalist and wanna be coder. I use web technologies such as HTML, CSS and JavaScript. I recently started to analyze data with Python and Pandas.

Updated on August 16, 2021

Comments

  • Felix
    Felix over 2 years

    I am new to Python and Pandas. I am trying to convert a Pandas Dataframe to a nested JSON. The function .to_json() doens't give me enough flexibility for my aim.

    Here are some data points of the dataframe (in csv, comma separated):

    ,ID,Location,Country,Latitude,Longitude,timestamp,tide  
    0,1,BREST,FRA,48.383,-4.495,1807-01-01,6905.0  
    1,1,BREST,FRA,48.383,-4.495,1807-02-01,6931.0  
    2,1,BREST,FRA,48.383,-4.495,1807-03-01,6896.0  
    3,1,BREST,FRA,48.383,-4.495,1807-04-01,6953.0  
    4,1,BREST,FRA,48.383,-4.495,1807-05-01,7043.0  
    2508,7,CUXHAVEN 2,DEU,53.867,8.717,1843-01-01,7093.0  
    2509,7,CUXHAVEN 2,DEU,53.867,8.717,1843-02-01,6688.0  
    2510,7,CUXHAVEN 2,DEU,53.867,8.717,1843-03-01,6493.0  
    2511,7,CUXHAVEN 2,DEU,53.867,8.717,1843-04-01,6723.0  
    2512,7,CUXHAVEN 2,DEU,53.867,8.717,1843-05-01,6533.0  
    4525,9,MAASSLUIS,NLD,51.918,4.25,1848-02-01,6880.0  
    4526,9,MAASSLUIS,NLD,51.918,4.25,1848-03-01,6700.0  
    4527,9,MAASSLUIS,NLD,51.918,4.25,1848-04-01,6775.0  
    4528,9,MAASSLUIS,NLD,51.918,4.25,1848-05-01,6580.0  
    4529,9,MAASSLUIS,NLD,51.918,4.25,1848-06-01,6685.0  
    6540,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-07-01,6957.0  
    6541,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-08-01,6944.0  
    6542,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-09-01,7084.0  
    6543,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-10-01,6898.0  
    6544,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-11-01,6859.0  
    8538,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-07-01,6909.0  
    8539,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-08-01,6940.0  
    8540,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-09-01,6961.0  
    8541,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-10-01,6952.0  
    8542,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-11-01,6952.0  
    

    There is a lot of repetitive information and I would like to have a JSON like this:

    [
    {
        "ID": 1,
        "Location": "BREST",
        "Latitude": 48.383,
        "Longitude": -4.495,
        "Country": "FRA",
        "Tide-Data": {
            "1807-02-01": 6931,
            "1807-03-01": 6896,
            "1807-04-01": 6953,
            "1807-05-01": 7043
        }
    },
    {
        "ID": 5,
        "Location": "HOLYHEAD",
        "Latitude": 53.31399999999999,
        "Longitude": -4.62,
        "Country": "GBR",
        "Tide-Data": {
            "1807-02-01": 6931,
            "1807-03-01": 6896,
            "1807-04-01": 6953,
            "1807-05-01": 7043
        }
    }
    ]
    

    How could I achieve this?

    EDIT:

    Code to reproduce the dataframe:

    # input json
    json_str = '[{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-01-01","tide":6905},{"ID":1,"Location":"BREST","Country":"FRA","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-02-01","tide":6931},{"ID":1,"Location":"BREST","Country":"DEU","Latitude":48.383,"Longitude":-4.495,"timestamp":"1807-03-01","tide":6896},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-01-01","tide":7093},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-02-01","tide":6688},{"ID":7,"Location":"CUXHAVEN 2","Country":"DEU","Latitude":53.867,"Longitude":-8.717,"timestamp":"1843-03-01","tide":6493}]'
    
    # load json object
    data_list = json.loads(json_str)
    
    # create dataframe
    df = json_normalize(data_list, None, None)
    
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA over 7 years
    @Felix, glad i could help :)
  • Felix
    Felix over 7 years
    I just realised that I need the data in this format: "Tide-Data": { "timestamp": "1848-07-01", "tide": "6957.0" }. What d I have to change in your function?
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA over 7 years
    @Felix, can you update your desired JSON in your question, so i could see how multiple (grouped) entries should look like?
  • Felix
    Felix over 7 years
    I think what your update is the right JSON format. I let you know tomorrow once I started plotting the line chart. Thank you very much for the update!
  • Bendy
    Bendy over 6 years
    Hi - great answer :-) One note though: I get a "TypeError: to_dict() takes 1 positional argument but 2 were given" error. When I drop the 'r' though, .to_dict() works. However, I see in the documentation that records (and r) should work
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA over 6 years
    @Bendy, Series.to_dict() doesn't accept this parameter, but DataFrame.to_dict() does...
  • Bendy
    Bendy over 6 years
    @MaxU - thanks for explaining! I've just been writing a related question actually on my problem in case you can help? stackoverflow.com/questions/46205399/…
  • u009988
    u009988 almost 6 years
    Can I apply the same idea for a 2 level nested json, i.e., there's another level under "Tide-Data" ?
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA almost 6 years
    @u009988, give it a try.
  • Shankar Panda
    Shankar Panda over 5 years
    @MaxU Thank you for your answer. If i need to create another nested in the same set, how do i define group by? Could you please help. This is the example [{ "name": "Vendor 1", "count": 2000, "categories": [{ "name": "Category 1", "count": 3000, "subCategories": [{ "name": "Sub Category 1", "count": 2000 }]
  • Shankar Panda
    Shankar Panda over 5 years
  • Shankar Panda
    Shankar Panda over 5 years
    @MaxU I would really appreciate your help . I am literally stuck here
  • Shankar Panda
    Shankar Panda over 5 years
    Would that be possible with panda dataframe?
  • Shankar Panda
    Shankar Panda over 5 years
    Sure no worries. Thank you very much
  • Shankar Panda
    Shankar Panda over 5 years
    Is it possible today @MaxU
  • Dance Party
    Dance Party over 3 years
    @MaxU As of newer versions of pandas (e.g. 1.2.1), this no longer works for me. I get this error: ValueError: 1 columns passed, passed data had n columns (where n is 5 in my case). What changed about pandas to make this happen?
  • monkey intern
    monkey intern about 3 years
    @DanceParty did you figure this out? I get the same error for some combination of columns but not others
  • mapsa
    mapsa over 2 years
    to run this example using pandas version 1.3.1: {j = (df.groupby(['ID','Location','Country','Latitude','Longitude‌​']) .apply(lambda x: x[['timestamp','tide']].to_dict('records')) .reset_index() .rename(columns={0:'Tide-Data'}) .to_json(orient='records'))}
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA over 2 years
    @mapsa, thank you for the hint - i've fixed the code in the answer, so it should work for modern versions of Pandas now)
  • lowercase00
    lowercase00 about 2 years
    @MaxU thanks a lot mate! Just wish Pandas had this built-in. Cheers