Using sqlalchemy session to execute sql DRASTICALLY slows execution time

10,739

Solution 1

sqlalchemy doesn't set up a query plan, or anything else fancy. It just generates SQL and sends it over a DB-API-2.0 connection. So, if you explicitly call execute with the same statement that sqlalchemy generates, it will run in exactly the same way.*

The simplest way to see what queries sqlalchemy is generating is to pass echo=True as an extra param on the create_engine call.

In your case, the query generated by sqlalchemy was in fact different from your manual query, because it was testing an integer parameter with a string, instead of with an int.


* This isn't 100% guaranteed; you have to make sure that any connection parameters in the DB-API-2.0 connect function are the same, and that neither you nor sqlalchemy executed any PRAGMA statements. But you can test those in much the same way you can test the query itself.

Solution 2

Here is a real test suite to compare MySQL cursor with SQLAlchemy engine and session. Please substitute your connection information and SQL at the bottom, then run it. Let us know what the timings are.

import time

def time_thing(fn, description):
    print "Running %s" % description
    now = time.time()
    try:
        ret = fn()
        return ret
    finally:
        spent = time.time() - now
        print "Finished %s, took %d seconds" % (description, spent)

def with_mysqldb(sql):
    import MySQLdb

    conn = MySQLdb.connect(db=DBNAME, user=USERNAME, passwd=PASSWORD, host=HOST)

    def go():
        cursor = conn.cursor()
        cursor.execute(sql)

        # if result fetching is the issue:
        # cursor.fetchall()

        cursor.close()

    time_thing(go, "Executing SQL with MySQLdb cursor")

def _sqla_engine_w_test_connection():
    from sqlalchemy import create_engine
    eng = create_engine(SQLALCHEMY_URL)

    def test():
        conn = eng.connect()
        result = conn.execute("select 1")
        assert result.fetchall()[0] == (1, )

    time_thing(test, "Making a test connection...")

    return eng

def with_sqlalchemy(sql):
    eng = _sqla_engine_w_test_connection()

    def go():
        result = eng.execute(sql)

        # if result fetching is the issue:
        # result.fetchall()

        result.close()
    time_thing(go, "Executing SQL with SQLA engine")

def with_sqlalchemy_session(sql):
    from sqlalchemy.orm import Session

    eng = _sqla_engine_w_test_connection()

    def go():
        sess = Session(eng)

        result = sess.execute(sql)

        # if result fetching is the issue:
        # result.fetchall()

        result.close()

    time_thing(go, "Executing SQL SQLA session")

SQLALCHEMY_URL = "mysql://scott:tiger@localhost/test"
DBNAME = "test"
HOST = "localhost"
USERNAME = "scott"
PASSWORD = "tiger"
SQL = "SELECT 1"

with_mysqldb(SQL)
with_sqlalchemy(SQL)
with_sqlalchemy_session(SQL)

Solution 3

Which DBAPI are you using? Maybe try to change it to something else. I'm working with PostgreSQL now and I experience big difference in performance between pypostgresql and psycopg2 (latter one is much faster).

For a list of available DBAPIs for MySQL refer to SQLAchemy documentation: chapter 4.1.5.

Share:
10,739
Colleen
Author by

Colleen

Been working with Django and Python for about a year now... but still have a lot to learn.

Updated on June 04, 2022

Comments

  • Colleen
    Colleen almost 2 years

    I have a rather long query (was 7 joins, now is 7 subselects because in raw sql 7 subselects was considerably faster-- I don't even know when the 7 joins would have finished if I'd let it run, but longer than 1 min versus .05-.1 seconds with subselects)

    When I run it on the db it takes, as I said, .05-.1 seconds to execute. Simply using session.execute() slows it down to over a minute!

    Is there anything I can do?

    Let me know if you need more info-- I kind of suspect this is a general sqlalchemy thing-- like maybe sqlalchemy is setting up a query plan instead of just letting mysql do it? Or...?

    EDIT: ran explain on both and they seem identical except that sqlalchemy adds a "using temporary; using filesort" to the extra column. Is that what's slowing it down? How do I stop it from doing that?

    EDIT 2: DEFINITELY sqlalchemy. I tried using a MySQL cursor to execute instead of an SA session and got the same .05 second runtime.

    EDIT 3:

    The code to create our engine:

    engine_ro = create_engine(
        config.ro_database_url, #string with username, password, db
        pool_size=config.database_pool_size, #int
        max_overflow=config.database_max_overflow, #int
        pool_timeout=config.database_timeout, # int
        echo=config.database_echo, #False
        echo_pool=config.database_echo, #same as echo #False
        listeners=[GoneAway()] if config.database_use_listeners else None)
    

    where GoneAway() is a method that executes a SELECT 1 to check the connection.

    To create the session object:

    SessionRO = scoped_session(sessionmaker(bind=engine_ro, autocommit=False))
    

    where scoped_session and sessionmaker are sqlalchemy functions.

    Then, the code that executes the query:

    session = SessionRO()
    results = session.execute(sql, params)
    

    EDIT 4: In case anyone is wondering, if I comment out the listeners bit, it's still slow. As well if I just use sessionmaker without scoped_session.