Parsing json into Insert statements with Python

11,993

Solution 1

In Python, you can do something like this using sqlite3 and json, both from the standard library.

import json
import sqlite3

# The string representing the json.
# You will probably want to read this string in from
# a file rather than hardcoding it.
s = """[
    {
        "id": 1001, 
        "name": "John", 
        "age" : 30 
    }, 
    {
        "id" : 1002,
        "name" : "Peter",
        "age" : 25
    },
    {
        "id" : 1002,
        "name" : "Kevin",
        "age" : 35,
        "salary" : 5000
    }
]"""

# Read the string representing json
# Into a python list of dicts.
data = json.loads(s)


# Open the file containing the SQL database.
with sqlite3.connect("filename.db") as conn:

    # Create the table if it doesn't exist.
    conn.execute(
        """CREATE TABLE IF NOT EXISTS tab(
                id int,
                name varchar(100),
                age int,
                salary int
            );"""
        )

    # Insert each entry from json into the table.
    keys = ["id", "name", "age", "salary"]
    for entry in data:

        # This will make sure that each key will default to None
        # if the key doesn't exist in the json entry.
        values = [entry.get(key, None) for key in keys]

        # Execute the command and replace '?' with the each value
        # in 'values'. DO NOT build a string and replace manually.
        # the sqlite3 library will handle non safe strings by doing this.
        cmd = """INSERT INTO tab VALUES(
                    ?,
                    ?,
                    ?,
                    ?
                );"""
        conn.execute(cmd, values)

    conn.commit()

This will create a file named 'filename.db' in the current directory with the entries inserted.

To test the tables:

# Testing the table.
with sqlite3.connect("filename.db") as conn:
    cmd = """SELECT * FROM tab WHERE SALARY NOT NULL;"""
    cur = conn.execute(cmd)
    res = cur.fetchall()
    for r in res:
        print(r)

Solution 2

You could try this:

import json

TABLE_NAME = "tab"

sqlstatement = ''
with open ('data.json','r') as f:
    jsondata = json.loads(f.read())

for json in jsondata:
    keylist = "("
    valuelist = "("
    firstPair = True
    for key, value in json.items():
        if not firstPair:
            keylist += ", "
            valuelist += ", "
        firstPair = False
        keylist += key
        if type(value) in (str, unicode):
            valuelist += "'" + value + "'"
        else:
            valuelist += str(value)
    keylist += ")"
    valuelist += ")"

    sqlstatement += "INSERT INTO " + TABLE_NAME + " " + keylist + " VALUES " + valuelist + "\n"

print(sqlstatement)

However for this to work, you'll need to change your JSON file to correct the syntax like this:

[{  
    "id" : 1001, 
    "name" : "John", 
    "age" : 30 
} , 

{   
    "id" : 1002,
    "name" : "Peter",
    "age" : 25
},

{
    "id" : 1003,
    "name" : "Kevin",
    "age" : 35,
    "salary" : 5000
}]

Running this gives the following output:

INSERT INTO tab (age, id, name) VALUES (30, 1001, 'John')
INSERT INTO tab (age, id, name) VALUES (25, 1002, 'Peter')
INSERT INTO tab (salary, age, id, name) VALUES (5000, 35, 1003, 'Kevin')

Note that you don't need to specify NULLs. If you don't specify a column in the insert statement, it should automatically insert NULL into any columns you left out.

Share:
11,993

Related videos on Youtube

Matthew
Author by

Matthew

Updated on June 04, 2022

Comments

  • Matthew
    Matthew almost 2 years

    I have a file which contains several json records. I have to parse this file and load each of the jsons to a particular SQL-Server table. However, the table might not exist on the database, in which case I have to also create it first before loading. So, I have to parse the json file and figure out the fields/columns and create the table. Then I will have to de-serialize the jsons into records and insert them into the table created. However, the caveat is that some fields in the json are optional i.e. a field might be absent from one json record but could be present in another record. Below is an example file with 3 records :-

    { id : 1001, 
      name : "John", 
      age : 30 
    } , 
    
    { id : 1002,
      name : "Peter",
      age : 25
    },
    
    { id : 1002,
      name : "Kevin",
      age : 35,
      salary : 5000
    },
    

    Notice that the field salary appears only in the 3rd record. The results should be :-

    CREATE TABLE tab ( id int, name varchar(100), age int, salary int );
    
    INSERT INTO tab (id, name, age, salary) values (1001, 'John', 30, NULL)
    INSERT INTO tab (id, name, age, salary) values (1002, 'Peter', 25, NULL)
    INSERT INTO tab (id, name, age, salary) values (1003, 'Kevin', 35, 5000)
    

    Can anyone please help me with some pointers as I am new to Python. Thanks.

    • Daniel
      Daniel over 5 years
      Perhaps, SQL databases are not the best choice. SQL needs a fixed schema.
    • Jon Warren
      Jon Warren over 5 years
      Just a heads up that your example JSON file is not a valid JSON. You would need to encapsulate the entire file in a set of square brackets, as well as encapsulate your keys with double quotations.
  • Matthew
    Matthew over 5 years
    Thanks a ton @Jon Warren . Worked like a charm... that's exactly what I needed. Also learned a lot of new things. Thanks again.
  • enigma6205
    enigma6205 over 2 years
    Hi @John Warren and Mathew I was wondering if you know how to solve the same problem like the one above, but for nested (with more complex tree structure) Json format? I have my question here: stackoverflow.com/questions/68777843/…