Convert CSV to JSON (in specific format) using Python
Since you are using pandas already, I tried to get as much mileage as I could out of dataframe methods. I also ended up wandering fairly far afield from your implementation. I think the key here, though, is don't try to get too clever with your list and/or dictionary comprehensions. You can very easily confuse yourself and everyone who reads your code.
import pandas as pd
from itertools import groupby
from collections import OrderedDict
import json
df = pd.read_csv('SampleCsvFile.csv', dtype={
"zipcode" : str,
"date" : str,
"state" : str,
"val1" : str,
"val2" : str,
"val3" : str,
"val4" : str,
"val5" : str
})
results = []
for (zipcode, state), bag in df.groupby(["zipcode", "state"]):
contents_df = bag.drop(["zipcode", "state"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("zipcode", zipcode),
("state", state),
("subset", subset)]))
print json.dumps(results[0], indent=4)
#with open('ExpectedJsonFile.json', 'w') as outfile:
# outfile.write(json.dumps(results[0], indent=4))
The simplest way to have all the json datatypes written as strings, and to retain their original formatting, was to force read_csv
to parse them as strings. If, however, you need to do any numerical manipulation on the values before writing out the json, you will have to allow read_csv
to parse them numerically and coerce them into the proper string format before converting to json.
Steve
Updated on June 05, 2022Comments
-
Steve almost 2 years
I would like to convert a csv file into a json file using python 2.7. Down below is the python code I tried but it is not giving me expected result. Also, I would like to know if there is any simplified version than mine. Any help is appreciated.
Here is my csv file (SampleCsvFile.csv):
zipcode,date,state,val1,val2,val3,val4,val5 95110,2015-05-01,CA,50,30.00,5.00,3.00,3 95110,2015-06-01,CA,67,31.00,5.00,3.00,4 95110,2015-07-01,CA,97,32.00,5.00,3.00,6
Here is the expected json file (ExpectedJsonFile.json):
{ "zipcode": "95110", "state": "CA", "subset": [ { "date": "2015-05-01", "val1": "50", "val2": "30.00", "val3": "5.00", "val4": "3.00", "val5": "3" }, { "date": "2015-06-01", "val1": "67", "val2": "31.00", "val3": "5.00", "val4": "3.00", "val5": "4" }, { "date": "2015-07-01", "val1": "97", "val2": "32.00", "val3": "5.00", "val4": "3.00", "val5": "6" } ] }
Here's the python code I tried:
import pandas as pd from itertools import groupby import json df = pd.read_csv('SampleCsvFile.csv') names = df.columns.values.tolist() data = df.values master_list2 = [ (d["zipcode"], d["state"], d) for d in [dict(zip(names, d)) for d in data] ] intermediate2 = [(k, [x[2] for x in list(v)]) for k,v in groupby(master_list2, lambda t: (t[0],t[1]) )] nested_json2 = [dict(zip(names,(k[0][0], k[0][1], k[1]))) for k in [(i[0], i[1]) for i in intermediate2]] #print json.dumps(nested_json2, indent=4) with open('ExpectedJsonFile.json', 'w') as outfile: outfile.write(json.dumps(nested_json2, indent=4))
-
Steve over 8 yearsWhen I try to execute your code, I get the type error: 95110 is not JSON serializable. Also, the output doesn't have double quotes to numerical values and truncates the decimal portion. Any idea on how to resolve it?
-
Steve over 8 yearsHere's the output of "print result" (without the json.dumps):
{'subset': [{'val3': 5.0, 'date': '2015-05-01', 'val2': 30.0, 'val1': 50L, 'val5': 3L, 'val4': 3.0}, {'val3': 5.0, 'date': '2015-06-01', 'val2': 31.0, 'val1': 67L, 'val5': 4L, 'val4': 3.0}, {'val3': 5.0, 'date': '2015-07-01', 'val2': 32.0, 'val1': 97L, 'val5': 6L, 'val4': 3.0}], 'state': 'CA', 'zipcode': 95110}
-
Aniket Schneider over 8 yearsI adjusted the answer to force all the datatypes to be read as strings, which will resolve the quoting and numerical formatting issues. I'm not sure about the type error you're experiencing, as I can't replicate that, but since the datatype of the parsed zipcode is different now there is a chance that will be resolved as well.
-
Steve over 8 yearsNew code works like a charm. Thanks a lot Aniket, I really appreciate it.
-
Steve over 8 yearsone question though, is there a way to change/set the order of elements? For example, I would like to see state and zipcode on the top and subset in the bottom. Right now, the order seems to be random.
-
Aniket Schneider over 8 yearsUpdated the answer to force ordering. Note that json objects are considered unordered collections, so anything that parses json should not care about the ordering. However, for human readability, you can use the python OrderedDict to set the order you want.
-
Steve over 8 yearsUnderstood. Thanks again!
-
Aniket Schneider over 8 yearsDon't forget to accept the answer if it has solved your problem! :)
-
Steve over 8 yearsjust did. :) Thanks for the remainder.
-
DUDANF almost 5 yearsOne question! I'm hoping someone still replies. So you know how you've created a subset? Basically this is the result of my subset: ``` "Serial_numbers": [ { "For Serial": "0018F", "Rev Serial": "0018R" } ] ``` I want this to simply be ``` "Serial Numbers": [ { "0018F -0018R" : {"ADD SOME HARDCODED VALUES"} } ] ``` Anyone know how I can use OrderedDict and bag to do this?