Speed up inserts into SQL Server from pyodbc

python sql-server pyodbc

20,549

UPDATE: pyodbc 4.0.19 added a Cursor#fast_executemany option that can greatly improve performance by avoiding the behaviour described below. See this answer for details.

Your code does follow proper form (aside from the few minor tweaks mentioned in the other answer), but be aware that when pyodbc performs an .executemany what it actually does is submit a separate sp_prepexec for each individual row. That is, for the code

sql = "INSERT INTO #Temp (id, txtcol) VALUES (?, ?)"
params = [(1, 'foo'), (2, 'bar'), (3, 'baz')]
crsr.executemany(sql, params)

the SQL Server actually performs the following (as confirmed by SQL Profiler)

exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',1,N'foo'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',2,N'bar'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',3,N'baz'

So, for an .executemany "batch" of 10,000 rows you would be

performing 10,000 individual inserts,
with 10,000 round-trips to the server, and
sending the identical SQL command text (INSERT INTO ...) 10,000 times.

It is possible to have pyodbc send an initial sp_prepare and then do an .executemany calling sp_execute, but the nature of .executemany is that you still would do 10,000 sp_prepexec calls, just executing sp_execute instead of INSERT INTO .... That could improve performance if the SQL statement was quite long and complex, but for a short one like the example in your question it probably wouldn't make all that much difference.

One could also get creative and build "table value constructors" as illustrated in this answer, but notice that it is only offered as a "Plan B" when native bulk insert mechanisms are not a feasible solution.

20,549

Author by

mikebmassey

I'm a analytics guy for a financial company. Passionate about new technology. Trying to simultaneously learn 5 different programming languages and failing at all of them...

Updated on July 24, 2020

Comments

mikebmassey almost 4 years

In python, I have a process to select data from one database (Redshift via psycopg2), then insert that data into SQL Server (via pyodbc). I chose to do a read / write rather than a read / flat file / load because the row count is around 100,000 per day. Seemed easier to simply connect and insert. However - the insert process is slow, taking several minutes.

Is there a better way to insert data into SQL Server with Pyodbc?

select_cursor.execute(output_query)

done = False
rowcount = 0

while not done:
    rows = select_cursor.fetchmany(10000)

    insert_list = []

    if rows == []:
        done = True
        break

    for row in rows:
        rowcount += 1

        insert_params = (
            row[0], 
            row[1], 
            row[2]
            )

        insert_list.append(insert_params)            

    insert_cnxn = pyodbc.connect('''Connection Information''')

    insert_cursor = insert_cnxn.cursor()

    insert_cursor.executemany("""
        INSERT INTO Destination (AccountNumber, OrderDate, Value)
        VALUES (?, ?, ?)
        """, insert_list)

    insert_cursor.commit()
    insert_cursor.close()
    insert_cnxn.close()

select_cursor.close()
select_cnxn.close()