Why would I get a memory error with fast_executemany on a tiny df?

10,605

Solution 1

I was able to reproduce your issue using pyodbc 4.0.23. The MemoryError was related to your use of the ancient

DRIVER={SQL Server}

Further testing using

DRIVER=ODBC Driver 11 for SQL Server

also failed, with

Function sequence error (0) (SQLParamData)

which was related to an existing pyodbc issue on GitHub. I posted my findings here.

That issue is still under investigation. In the meantime you might be able to proceed by

  • using a newer ODBC driver like DRIVER=ODBC Driver 13 for SQL Server, and
  • running pip install pyodbc==4.0.22 to use an earlier version of pyodbc.

Solution 2

I was facing thi issue too.

Solved by changing:

conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"

To this:

conn =  "DRIVER={ODBC Driver 17 for SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"
Share:
10,605
Martin Bobak
Author by

Martin Bobak

Updated on June 05, 2022

Comments

  • Martin Bobak
    Martin Bobak almost 2 years

    I was looking for ways to speed up pushing a dataframe to sql server and stumbled upon an approach here. This approach blew me away in terms of speed. Using normal to_sql took almost 2 hours and this script was done in 12.54 seconds to push a 100k row X 100 column df.

    So after testing the code below with a sample df, I attempted to use a df that had many different datatypes (int, string, floats, Booleans). However, I was sad to see a memory error. So I started reducing the size of my df to to see what the limitations were. I noticed that if my df had any strings then I wasn't able to load to sql server. I am having trouble isolating the issue further. The script below is taken from the question in the link, however, I added a tiny df with strings. Any suggestions on how to rectify this issue would be great!

    import pandas as pd
    import numpy as np
    import time
    from sqlalchemy import create_engine, event
    from urllib.parse import quote_plus
    import pyodbc
    
    conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"
    quoted = quote_plus(conn)
    new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
    engine = create_engine(new_con)
    
    
    @event.listens_for(engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        print("FUNC call")
        if executemany:
            cursor.fast_executemany = True
    
    
    table_name = 'fast_executemany_test'
    df1 = pd.DataFrame({'col1':['tyrefdg','ertyreg','efdgfdg'],
                       'col2':['tydfggfdgrefdg','erdfgfdgfdgfdgtyreg','edfgfdgdfgdffdgfdg']
                       })
    
    
    
    s = time.time()
    df1.to_sql(table_name, engine, if_exists = 'replace', chunksize = None)
    print(time.time() - s)
    
  • Martin Bobak
    Martin Bobak about 6 years
    Thanks! pyodbc 4.0.22 didn't work for me but pyodbc 4.0.19 worked fine. Also, I noticed that the script throws an error when adding datatypes into the to_sql statement. So i just load the data to SQL Server and change the datatypes there, in case anyone is getting that same error.
  • Umar.H
    Umar.H almost 5 years
    Thanks @MartinBobak It's such a strange error..! I had to fiddle around with the metadata and it worked, but very annoying. Also found the performance boost amazing.