How to read multiple json files into pandas dataframe?
Solution 1
Change the last line to:
temp = temp.append(data, ignore_index = True)
The reason we have to do this is because the append doesn't happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation.
Edit:
Since writing this answer I have learned that you should never use DataFrame.append
inside a loop because it leads to quadratic copying (see this answer).
What you should do instead is first create a list of data frames and then use pd.concat
to concatenate them all in a single operation. Like this:
dfs = [] # an empty list to store the data frames
for file in file_list:
data = pd.read_json(file, lines=True) # read data frame from json file
dfs.append(data) # append the data frame to the list
temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.
This alternative should be considerably faster.
Solution 2
If you need to flatten the JSON, Juan Estevez’s approach won’t work as is. Here is an alternative :
import pandas as pd
dfs = []
for file in file_list:
with open(file) as f:
json_data = pd.json_normalize(json.loads(f.read()))
dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs
Or if your JSON are line-delimited (not tested) :
import pandas as pd
dfs = []
for file in file_list:
with open(file) as f:
for line in f.readlines():
json_data = pd.json_normalize(json.loads(line))
dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs
Solution 3
I combined Juan Estevez's answer with glob. Thanks a lot.
import pandas as pd
import glob
def readFiles(path):
files = glob.glob(path)
dfs = [] # an empty list to store the data frames
for file in files:
data = pd.read_json(file, lines=True) # read data frame from json file
dfs.append(data) # append the data frame to the list
df = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.
return df
onetap
Updated on February 17, 2022Comments
-
onetap about 2 years
I'm having a hard time loading multiple line delimited JSON files into a single pandas dataframe. This is the code I'm using:
import os, json import pandas as pd import numpy as np import glob pd.set_option('display.max_columns', None) temp = pd.DataFrame() path_to_json = '/Users/XXX/Desktop/Facebook Data/*' json_pattern = os.path.join(path_to_json,'*.json') file_list = glob.glob(json_pattern) for file in file_list: data = pd.read_json(file, lines=True) temp.append(data, ignore_index = True)
It looks like all the files are loading when I look through
file_list
, but cannot figure out how to get each file into a dataframe. There are about 50 files with a couple lines in each file. -
Skippy le Grand Gourou over 3 yearsYou probably mean "if the files are valid JSON or not"… ;)
-
BGG16 over 3 yearsThanks @ Skippy le Grand Gourou. I think your code needs 'pd.' in front of 'json_normalize' to run.