Create nested JSON from CSV
The csv
module will handle the CSV reading nicely - including handling line breaks that are within quotes.
import csv
with open('my_csv.csv') as csv_file:
for row in csv.reader(csv_file):
# do work
The csv.reader
object is an iterator - you can iterate through the rows in the CSV by using a for
loop. Each row is a list, so you can get each field as row[0]
, row[1]
, etc. Be aware that this will load the first row (which just contains field names in your case).
As we have field names given to us in the first row, we can use csv.DictReader
so that fields in each row can be accessed as row['id']
, row['name']
, etc. This will also skip the first row for us:
import csv
with open('my_csv.csv') as csv_file:
for row in csv.DictReader(csv_file):
# do work
For the JSON export, use the json
module. json.dumps()
will take Python data structures such as lists and dictionaries and return the appropriate JSON string:
import json
my_data = {'id': 123, 'name': 'Test User', 'emails': ['[email protected]', '[email protected]']}
my_data_json = json.dumps(my_data)
If you want to generate JSON output exactly as you posted, you'd do something like:
output = {'persons': []}
with open('my_csv.csv') as csv_file:
for person in csv.DictReader(csv_file):
output['persons'].append({
'type': 'config.profile',
'id': person['id'],
# ...add other fields (email etc) here...
})
# ...do similar for config.pictures, config.status, etc...
output_json = json.dumps(output)
output_json
will contain the JSON output that you want.
However, I'd suggest you carefully consider the structure of the JSON output that you're after - at the moment, you're defining an outer dictionary that serves no purpose, and you're adding all your 'config
' data directly under 'persons
' - you may want to reconsider this.
Comments
-
cherrun almost 2 years
I already read Create nested JSON from flat csv, but it didn't help in my case.
I have quite a big spreadsheet created with Google Docs consisting of 11 rows and 74 columns (some columns are not occupied).
I created an example on Google Drive. When exported as a
CSV
it looks like this:id,name,email,phone,picture01,picture02,picture03,status 1,Alice,[email protected],2131232,"image01_01 [this is an image]",image01_02,image01_03,single 2,Bob,[email protected],2854839,image02_01,"image02_02 [description to image 2]",,married 3,Frank,[email protected],987987,image03_01,image03_02,,single 4,Shawn,[email protected],,image04_01,,,single
Now I would like to have a
JSON
structure, which looks like this:{ "persons": [ { "type": "config.profile", "id": "1", "email": "[email protected]", "pictureId": "p01", "statusId": "s01" }, { "type": "config.pictures", "id": "p01", "album": [ { "image": "image01_01", "description": "this is an image" }, { "image": "image_01_02", "description": "" }, { "image": "image_01_03", "description": "" } ] }, { "type": "config.status", "id": "s01", "status": "single" }, { "type": "config.profile", "id": "2", "email": "[email protected]", "pictureId": "p02", "statusId": "s02" }, { "type": "config.pictures", "id": "p02", "album": [ { "image": "image02_01", "description": "" }, { "image": "image_02_02", "description": "description to image 2" } ] }, { "type": "config.status", "id": "s02", "status": "married" } ] }
And so on for the other lines.
My theoretical approach would be to go through the
CSV
file per row (here starts the first problem: now every row is equal to one line, but sometimes several, thus I need to count the commas?). Each row is equal to a block ofconfig.profile
, including theid
,email
,pictureId
, andstatusId
(the latter two are being generated depending on the row number).Then for each row a
config.pictures
block is generated with the sameid
as the one inserted in theconfig.profile
block. Thealbum
is an array of as many elements as pictures are given.Lastly each row has a
config.status
block, which, again, has the sameid
as the one given inconfig.profile
, and one entry ofstatus
with the corresponding status.I'm entirely clueless how to create the nested and conditional JSON file.
I just got to the point where I convert the
CSV
to validJSON
, without any nesting and additional info, which are not directly given in theCSV
, like thetype
,pictureId
,statusId
, and so on.Any help is appreciated. If it is easier to program this in another script language (like
ruby
), I would gladly switch to those).Before someone thinks this is a homework or whatnot. It is not. I just want to automate an otherwise very tiresome copy&paste task.