Using sqlalchemy session to execute sql DRASTICALLY slows execution time
Solution 1
sqlalchemy
doesn't set up a query plan, or anything else fancy. It just generates SQL and sends it over a DB-API-2.0 connection. So, if you explicitly call execute
with the same statement that sqlalchemy
generates, it will run in exactly the same way.*
The simplest way to see what queries sqlalchemy
is generating is to pass echo=True
as an extra param on the create_engine
call.
In your case, the query generated by sqlalchemy
was in fact different from your manual query, because it was testing an integer parameter with a string, instead of with an int.
* This isn't 100% guaranteed; you have to make sure that any connection parameters in the DB-API-2.0 connect
function are the same, and that neither you nor sqlalchemy
executed any PRAGMA
statements. But you can test those in much the same way you can test the query itself.
Solution 2
Here is a real test suite to compare MySQL cursor with SQLAlchemy engine and session. Please substitute your connection information and SQL at the bottom, then run it. Let us know what the timings are.
import time
def time_thing(fn, description):
print "Running %s" % description
now = time.time()
try:
ret = fn()
return ret
finally:
spent = time.time() - now
print "Finished %s, took %d seconds" % (description, spent)
def with_mysqldb(sql):
import MySQLdb
conn = MySQLdb.connect(db=DBNAME, user=USERNAME, passwd=PASSWORD, host=HOST)
def go():
cursor = conn.cursor()
cursor.execute(sql)
# if result fetching is the issue:
# cursor.fetchall()
cursor.close()
time_thing(go, "Executing SQL with MySQLdb cursor")
def _sqla_engine_w_test_connection():
from sqlalchemy import create_engine
eng = create_engine(SQLALCHEMY_URL)
def test():
conn = eng.connect()
result = conn.execute("select 1")
assert result.fetchall()[0] == (1, )
time_thing(test, "Making a test connection...")
return eng
def with_sqlalchemy(sql):
eng = _sqla_engine_w_test_connection()
def go():
result = eng.execute(sql)
# if result fetching is the issue:
# result.fetchall()
result.close()
time_thing(go, "Executing SQL with SQLA engine")
def with_sqlalchemy_session(sql):
from sqlalchemy.orm import Session
eng = _sqla_engine_w_test_connection()
def go():
sess = Session(eng)
result = sess.execute(sql)
# if result fetching is the issue:
# result.fetchall()
result.close()
time_thing(go, "Executing SQL SQLA session")
SQLALCHEMY_URL = "mysql://scott:tiger@localhost/test"
DBNAME = "test"
HOST = "localhost"
USERNAME = "scott"
PASSWORD = "tiger"
SQL = "SELECT 1"
with_mysqldb(SQL)
with_sqlalchemy(SQL)
with_sqlalchemy_session(SQL)
Solution 3
Which DBAPI are you using? Maybe try to change it to something else. I'm working with PostgreSQL now and I experience big difference in performance between pypostgresql and psycopg2 (latter one is much faster).
For a list of available DBAPIs for MySQL refer to SQLAchemy documentation: chapter 4.1.5.
Colleen
Been working with Django and Python for about a year now... but still have a lot to learn.
Updated on June 04, 2022Comments
-
Colleen almost 2 years
I have a rather long query (was 7 joins, now is 7 subselects because in raw sql 7 subselects was considerably faster-- I don't even know when the 7 joins would have finished if I'd let it run, but longer than 1 min versus .05-.1 seconds with subselects)
When I run it on the db it takes, as I said, .05-.1 seconds to execute. Simply using
session.execute()
slows it down to over a minute!Is there anything I can do?
Let me know if you need more info-- I kind of suspect this is a general sqlalchemy thing-- like maybe sqlalchemy is setting up a query plan instead of just letting mysql do it? Or...?
EDIT: ran explain on both and they seem identical except that sqlalchemy adds a "using temporary; using filesort" to the
extra
column. Is that what's slowing it down? How do I stop it from doing that?EDIT 2: DEFINITELY sqlalchemy. I tried using a MySQL cursor to execute instead of an SA session and got the same .05 second runtime.
EDIT 3:
The code to create our engine:
engine_ro = create_engine( config.ro_database_url, #string with username, password, db pool_size=config.database_pool_size, #int max_overflow=config.database_max_overflow, #int pool_timeout=config.database_timeout, # int echo=config.database_echo, #False echo_pool=config.database_echo, #same as echo #False listeners=[GoneAway()] if config.database_use_listeners else None)
where
GoneAway()
is a method that executes aSELECT 1
to check the connection.To create the session object:
SessionRO = scoped_session(sessionmaker(bind=engine_ro, autocommit=False))
where
scoped_session
andsessionmaker
are sqlalchemy functions.Then, the code that executes the query:
session = SessionRO() results = session.execute(sql, params)
EDIT 4: In case anyone is wondering, if I comment out the
listeners
bit, it's still slow. As well if I just usesessionmaker
without scoped_session.