Convert a column of json strings into columns of data

10,544

Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)

One way to do that is to apply a function to the column

# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])

# print(df)
                         json
# 0  0 {"a":"1","b":"2","c":"3"}
# 1  1 {"a" :"4","b":"5","c":"6"}

# function to remove the number
import re

def split_num(val):
    p = re.compile("({.*)")
    return p.search(val).group(1)

# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)

#                          json
# 0   {"a":"1","b":"2","c":"3"}
# 1  {"a" :"4","b":"5","c":"6"}

Method:

Once the df is in the above format, the below will convert each row entry to a dictionary:

df['json'] = df['json'].map(lambda x: dict(eval(x)))

Then, applying pd.Series to the column will do the job

d = df['json'].apply(pd.Series)
print(d)
#   a  b  c
# 0  1  2  3
# 1  4  5  6
Share:
10,544

Related videos on Youtube

PRATHAMESH
Author by

PRATHAMESH

Updated on June 04, 2022

Comments

  • PRATHAMESH
    PRATHAMESH almost 2 years

    I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data

    two rows looks like

    0 {"a":"1","b":"2","c":"3"}
    1 {"a" ;"4","b":"5","c":"6"}
    

    I want to convert this into a dataframe like

    a   b   c
    1   2   3
    4   5   6
    

    Please help