Convert CSV to JSON (in specific format) using Python

python json python-2.7 csv python-3.x

15,921

Since you are using pandas already, I tried to get as much mileage as I could out of dataframe methods. I also ended up wandering fairly far afield from your implementation. I think the key here, though, is don't try to get too clever with your list and/or dictionary comprehensions. You can very easily confuse yourself and everyone who reads your code.

import pandas as pd
from itertools import groupby 
from collections import OrderedDict
import json    

df = pd.read_csv('SampleCsvFile.csv', dtype={
            "zipcode" : str,
            "date" : str,
            "state" : str,
            "val1" : str,
            "val2" : str,
            "val3" : str,
            "val4" : str,
            "val5" : str
        })

results = []

for (zipcode, state), bag in df.groupby(["zipcode", "state"]):
    contents_df = bag.drop(["zipcode", "state"], axis=1)
    subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
    results.append(OrderedDict([("zipcode", zipcode),
                                ("state", state),
                                ("subset", subset)]))

print json.dumps(results[0], indent=4)
#with open('ExpectedJsonFile.json', 'w') as outfile:
#    outfile.write(json.dumps(results[0], indent=4))

The simplest way to have all the json datatypes written as strings, and to retain their original formatting, was to force read_csv to parse them as strings. If, however, you need to do any numerical manipulation on the values before writing out the json, you will have to allow read_csv to parse them numerically and coerce them into the proper string format before converting to json.

15,921

Author by

Steve

Updated on June 05, 2022

Comments

Steve almost 2 years

I would like to convert a csv file into a json file using python 2.7. Down below is the python code I tried but it is not giving me expected result. Also, I would like to know if there is any simplified version than mine. Any help is appreciated.

Here is my csv file (SampleCsvFile.csv):

zipcode,date,state,val1,val2,val3,val4,val5
95110,2015-05-01,CA,50,30.00,5.00,3.00,3
95110,2015-06-01,CA,67,31.00,5.00,3.00,4
95110,2015-07-01,CA,97,32.00,5.00,3.00,6

Here is the expected json file (ExpectedJsonFile.json):

{
        "zipcode": "95110", 
        "state": "CA", 
        "subset": [
            {
                "date": "2015-05-01",
                "val1": "50", 
                "val2": "30.00", 
                "val3": "5.00", 
                "val4": "3.00", 
                "val5": "3"
            }, 
            {
                "date": "2015-06-01", 
                "val1": "67", 
                "val2": "31.00", 
                "val3": "5.00", 
                "val4": "3.00", 
                "val5": "4"
            }, 
            {
                "date": "2015-07-01", 
                "val1": "97", 
                "val2": "32.00", 
                "val3": "5.00", 
                "val4": "3.00", 
                "val5": "6"
            }
        ]

}

Here's the python code I tried:

import pandas as pd
from itertools import groupby 
import json    

df = pd.read_csv('SampleCsvFile.csv')

names = df.columns.values.tolist()
data = df.values

master_list2 = [ (d["zipcode"], d["state"], d) for d in [dict(zip(names, d)) for d in data] ]
intermediate2 = [(k, [x[2] for x in list(v)]) for k,v in groupby(master_list2, lambda t: (t[0],t[1]) )]
nested_json2 = [dict(zip(names,(k[0][0], k[0][1], k[1]))) for k in [(i[0], i[1]) for i in intermediate2]]

#print json.dumps(nested_json2, indent=4)
with open('ExpectedJsonFile.json', 'w') as outfile:
     outfile.write(json.dumps(nested_json2, indent=4))

Steve over 8 years

When I try to execute your code, I get the type error: 95110 is not JSON serializable. Also, the output doesn't have double quotes to numerical values and truncates the decimal portion. Any idea on how to resolve it?
Steve over 8 years

Here's the output of "print result" (without the json.dumps): {'subset': [{'val3': 5.0, 'date': '2015-05-01', 'val2': 30.0, 'val1': 50L, 'val5': 3L, 'val4': 3.0}, {'val3': 5.0, 'date': '2015-06-01', 'val2': 31.0, 'val1': 67L, 'val5': 4L, 'val4': 3.0}, {'val3': 5.0, 'date': '2015-07-01', 'val2': 32.0, 'val1': 97L, 'val5': 6L, 'val4': 3.0}], 'state': 'CA', 'zipcode': 95110}
Aniket Schneider over 8 years

I adjusted the answer to force all the datatypes to be read as strings, which will resolve the quoting and numerical formatting issues. I'm not sure about the type error you're experiencing, as I can't replicate that, but since the datatype of the parsed zipcode is different now there is a chance that will be resolved as well.
Steve over 8 years

New code works like a charm. Thanks a lot Aniket, I really appreciate it.
Steve over 8 years

one question though, is there a way to change/set the order of elements? For example, I would like to see state and zipcode on the top and subset in the bottom. Right now, the order seems to be random.
Aniket Schneider over 8 years

Updated the answer to force ordering. Note that json objects are considered unordered collections, so anything that parses json should not care about the ordering. However, for human readability, you can use the python OrderedDict to set the order you want.
Steve over 8 years

Understood. Thanks again!
Aniket Schneider over 8 years

Don't forget to accept the answer if it has solved your problem! :)
Steve over 8 years

just did. :) Thanks for the remainder.
DUDANF almost 5 years

One question! I'm hoping someone still replies. So you know how you've created a subset? Basically this is the result of my subset: ``` "Serial_numbers": [ { "For Serial": "0018F", "Rev Serial": "0018R" } ] ``` I want this to simply be ``` "Serial Numbers": [ { "0018F -0018R" : {"ADD SOME HARDCODED VALUES"} } ] ``` Anyone know how I can use OrderedDict and bag to do this?