Flatten double nested JSON
Solution 1
This is one way to do it. Should give you some ideas.
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
0 1
1 email [email protected] [email protected]
firstname John Jane
lastname Doe Doe
mobile 916-555-7890
orgname Anon Anon
phone 916-555-1234 916-555-4321
2 email [email protected] [email protected]
firstname Mickey Minny
lastname Moose Moose
mobile 916-555-1111
orgname Moosers Moosers
phone 916-555-0000 916-555-2222
To get a nice table with team name and members as rows, all attributes in columns:
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index()
To get team name and member as actual columns, just reset the index.
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
The whole thing
import json
import pandas as pd
json_text = """{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
"email": "[email protected]"
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
"email": "[email protected]"
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
"email": "[email protected]"
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
"email": "[email protected]"
}
]
}
]
}"""
data = json.loads(json_text)
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
Solution 2
Use pandas.io.json.json_normalize
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 [email protected] John Doe Anon 916-555-1234 1
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321 1
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 [email protected] Minny Moose Moosers 916-555-2222 2
Explanation
from pandas.io.json import json_normalize
import pandas as pd
I've only learned how to use the json_normalize function recently so my explanation might not be right.
Start with what I'm calling 'Layer 0'
json_normalize(data)
output:
teams
0 [{'teamname': '1', 'members': [{'firstname': '...
There is 1 Column and 1 Row. Everything is inside the 'team' column.
Look into what I'm calling 'Layer 1' by using record_path=
json_normalize(data,record_path='teams')
output:
members teamname
0 [{'firstname': 'John', 'lastname': 'Doe', 'org... 1
1 [{'firstname': 'Mickey', 'lastname': 'Moose', ... 2
In Layer 1 we have have flattened 'teamname' but there is more inside 'members'.
Look into Layer 2 with record_path=. The notation is unintuitive at first. I now remember it by ['layer','deeperlayer'] where the result is layer.deeperlayer.
json_normalize(data,record_path=['teams','members'])
output:
email firstname lastname mobile orgname phone
0 [email protected] John Doe Anon 916-555-1234
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000
3 [email protected] Minny Moose Moosers 916-555-2222
Excuse my output, I don't know how to make tables in a response.
Finally we add in Layer 1 columns using meta=
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 [email protected] John Doe Anon 916-555-1234 1
1 [email protected] Jane Doe 916-555-7890 Anon 916-555-4321 1
2 [email protected] Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 [email protected] Minny Moose Moosers 916-555-2222 2
Notice how we needed a list of lists for meta=[[]] to reference Layer 1. If there was a column we want from Layer 0 and Layer 1 we could do this:
json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])
The result of the json_normalize is a pandas dataframe.
spaine
GIS specialist and python programmer at Dept. of Fish & Wildlife.
Updated on July 16, 2022Comments
-
spaine almost 2 years
I am trying to flatten a JSON file that looks like this:
{ "teams": [ { "teamname": "1", "members": [ { "firstname": "John", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-1234", "mobile": "", "email": "[email protected]" }, { "firstname": "Jane", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "[email protected]" } ] }, { "teamname": "2", "members": [ { "firstname": "Mickey", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "[email protected]" }, { "firstname": "Minny", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-2222", "mobile": "", "email": "[email protected]" } ] } ]
}
I wish to export this to an excel table. My current code is this:
from pandas.io.json import json_normalize import json import pandas as pd inputFile = 'E:\\teams.json' outputFile = 'E:\\teams.xlsx' f = open(inputFile) data = json.load(f) f.close() df = pd.DataFrame(data) result1 = json_normalize(data, 'teams' ) print result1
results in this output:
members teamname 0 [{u'firstname': u'John', u'phone': u'916-555-... 1 1 [{u'firstname': u'Mickey', u'phone': u'916-555-... 2
There are 2 members's data nested within each row. I would like to have an output table that displays all 4 members' data plus their associated teamname.
-
spaine almost 8 yearsThank you, but what I am trying to achieve is a table that has names, email, etc as column headings with the team number as an additional column heading. That way the four people are represented by four flattened records.
-
piRSquared almost 8 yearsThis was intended to show you the sort of techniques that can be used to accomplish your goal. To get what you want, you now need to pivot. I'll update my answer to reflect this.
-
spaine almost 8 yearsThe example you've created looks perfect. When I run this the following error gets returned at the df.index.levels[0].name = 'teamname' line: AttributeError: 'Int64Index' object has no attribute 'levels'
-
piRSquared almost 8 yearsThat's my bad. I edited the post. I forgot to assign the
pd.concat
to the dataframedf
-
spaine almost 8 yearsThis is very helpful to me - I'm going to have to do some research to understand exactly how your code works. May I ask how you produced the data in table form that you posted above?
-
piRSquared almost 8 yearsThis solution is very specific to the json structure you provided. The html table is output while using jupyter-notebook. You can access the same html via df.to_html().
-
spaine almost 8 yearsHmm, when I include either or both of the last 2 scripts to turn it into a table, it remains as you show after pd.concat...
-
piRSquared almost 8 yearsOk, I was basing my code on an assumption that your data was the same as mine. I'll update the code again with the exact thing I ran.
-
spaine almost 8 yearsI was using the same data, just loading from a .json file. Even if I copy the entire text of the code into a python editor, printing the result displays what you show after pd.concat...
-
piRSquared almost 8 yearsAre you looking at
df
ordf.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
? The latter doesn't reassign todf
. So if you look atdf
after you've rundf.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
,df
will still look like it did after thepd.concat
-
spaine almost 8 yearsThat was it, I thought df had been reassigned. I assigned df.T.stack(0).swaplevel(0, 1).sort_index().reset_index() as an object and the export to Excel worked perfectly. Thank you very much.
-
Datacrawler almost 5 years@piRSquared Amazing solution. Hopefully, I can make it work when, for example, the phone has many entries e.g.
"phone": [{"home": "1111", "mobile": "2222"},{"home": "3333", "mobile": "44444"}