Pandas to_sql() to update unique values in DB?

12,474

In pandas, there is no convenient argument in to_sql to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS clause.

engine = sqlalchemy.create_engine(...)

df.to_sql(name='myTempTable', con=engine, if_exists='replace')

with engine.begin() as cn:
   sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
            SELECT t.Col1, t.Col2, t.Col3, ...
            FROM myTempTable t
            WHERE NOT EXISTS 
                (SELECT 1 FROM myFinalTable f
                 WHERE t.MatchColumn1 = f.MatchColumn1
                 AND t.MatchColumn2 = f.MatchColumn2)"""

   cn.execute(sql)

This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT and so is compliant in practically all SQL-integrated relational databases.

Share:
12,474
Merv Merzoug
Author by

Merv Merzoug

Updated on June 07, 2022

Comments

  • Merv Merzoug
    Merv Merzoug almost 2 years

    How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.

    Is there a parameter for this?

    I understand that the parameters if_exists = 'append' and if_exists = 'replace'is for the entire table - not the unique entries.

    I am using: 
    sqlalchemy
    
    pandas dataframe with the following datatypes: 
        index: datetime.datetime <-- Primary Key
        float
        float
        float
        float
        integer
        string <---  Primary Key
        string<----  Primary Key
    

    I'm stuck on this so your help is much appreciated. -Thanks