Pandas json_normalize produces confusing `KeyError` message?
Solution 1
In this case, I think you'd just use this:
In [57]: json_normalize(data[0]['events'])
Out[57]:
group schedule.ID schedule.date schedule.location.building \
0 A 815 2015-08-27 BDC
1 A 816 2015-08-27 BDC
schedule.location.floor
0 5
1 5
The meta
paths ([['schedule','date']...]
) are for specifying data at the same level of nesting as your records, i.e. at the same level as 'events'. It doesn't look like json_normalize
handles dicts with nested lists particularly well, so you may need to do some manual reshaping if your actual data is much more complicated.
Solution 2
I got the KeyError when the structue of the json was not consistent. Meaning, when one of the nested strucutes were missing from the json, I got KeyError.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html
From the examples mentioned on the pandas documentation site, if you make the nested tag (counties) missing on one of the records, you will get a KeyError. To circumvent this, you might have to make sure ignore the missing tag or consider only the records which have nested column/tag populated with data.
Solution 3
I had this same problem! This thread helped, especially parachute py's answer.
I found a solution using:
df.dropna(subset = *column(s) with nested data*)
then saving the resultant df
as a new json.
Load the new json and now you'll be able to flatten the nested columns.
There's probably a more efficient way to get around this, but my solution works.
edit: forgot to mention, I tried using the *errors = 'ignore'*
arg in json.normalize()
and it didn't help.
themachinist
Updated on July 25, 2022Comments
-
themachinist almost 2 years
I'm trying to convert a nested JSON to a Pandas dataframe. I've been using
json_normalize
with success until I came across a certain JSON. I've made a smaller version of it to recreate the problem.from pandas.io.json import json_normalize json=[{"events": [{"schedule": {"date": "2015-08-27", "location": {"building": "BDC", "floor": 5}, "ID": 815}, "group": "A"}, {"schedule": {"date": "2015-08-27", "location": {"building": "BDC", "floor": 5}, "ID": 816}, "group": "A"}]}]
I then run:
json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']])
Expecting to see something like this:
ID group schedule.date schedule.location.building schedule.location.floor '815' 'A' '2015-08-27' 'BDC' 5 '816' 'A' '2015-08-27' 'BDC' 5
But instead I get this error:
In [2]: json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']]) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-2-b588a9e3ef1d> in <module>() ----> 1 json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']]) /Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in json_normalize(data, record_path, meta, meta_prefix, record_prefix) 739 records.extend(recs) 740 --> 741 _recursive_extract(data, record_path, {}, level=0) 742 743 result = DataFrame(records) /Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in _recursive_extract(data, path, seen_meta, level) 734 meta_val = seen_meta[key] 735 else: --> 736 meta_val = _pull_field(obj, val[level:]) 737 meta_vals[key].append(meta_val) 738 /Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in _pull_field(js, spec) 674 if isinstance(spec, list): 675 for field in spec: --> 676 result = result[field] 677 else: 678 result = result[spec] KeyError: 'schedule'
-
devanathan over 7 yearsis there any way to get instead of schedule.location.floor as floor
-
Arthur Zangiev about 7 yearsyou can always rename columns by .rename(columns={'schedule.location.floor':'floor'})