creating a pandas dataframe from a database query that uses bind variables

10,868

Try using pandas.io.sql.read_sql_query. I used pandas version 0.20.1, I used it, it worked out:

import pandas as pd
import pandas.io.sql as psql
import cx_Oracle as odb
conn = odb.connect(_user +'/'+ _pass +'@'+ _dbenv)

sqlStr = """SELECT * FROM customers 
            WHERE id BETWEEN :v1 AND :v2
"""
pars = {"v1":1234, "v2":5678}
df = psql.frame_query(sqlStr, conn, params=pars)
Share:
10,868
David Marx
Author by

David Marx

Deep Learning, Machine Learning, Probability, Numerical Statistics, Bayesian Statistics, Regression Analysis, Simulation, Graph Analytics, Data Visualization, Data Mining, Information Extraction, Information Retrieval, NLP, NLU Python, Pytorch, R

Updated on June 13, 2022

Comments

  • David Marx
    David Marx almost 2 years

    I'm working with an Oracle database. I can do this much:

        import pandas as pd
        import pandas.io.sql as psql
        import cx_Oracle as odb
        conn = odb.connect(_user +'/'+ _pass +'@'+ _dbenv)
    
        sqlStr = "SELECT * FROM customers"
        df = psql.frame_query(sqlStr, conn)
    

    But I don't know how to handle bind variables, like so:

        sqlStr = """SELECT * FROM customers 
                    WHERE id BETWEEN :v1 AND :v2
                 """
    

    I've tried these variations:

       params  = (1234, 5678)
       params2 = {"v1":1234, "v2":5678}
    
       df = psql.frame_query((sqlStr,params), conn)
       df = psql.frame_query((sqlStr,params2), conn)
       df = psql.frame_query(sqlStr,params, conn)
       df = psql.frame_query(sqlStr,params2, conn)
    

    The following works:

       curs = conn.cursor()
       curs.execute(sqlStr, params)
       df = pd.DataFrame(curs.fetchall())
       df.columns = [rec[0] for rec in curs.description]
    

    but this solution is just...inellegant. If I can, I'd like to do this without creating the cursor object. Is there a way to do the whole thing using just pandas?

  • David Marx
    David Marx about 11 years
    I'd strongly advise against forming your SQL this way as it leaves your code vulnerable to SQL injection attacks. Even if your code/database isn't in a position to be vulnerable, you shouldn't get in the practice of forming your SQL this way. Bind variables are the safe way to go.
  • Paul H
    Paul H about 11 years
    @DavidMarx agreed. I shouldn't have assumed the OP was working from a command-line (or just a basic script) like I normally do.
  • David Marx
    David Marx about 11 years
    [FYI: I'm OP] Yeah, this is a self-contained file. I don't foresee any real issues with SQL injection in my current program since the people who will be using it will have direct access to the database anyway, but I would like to know for the future if I can use pandas in the way I described.
  • Paul H
    Paul H about 11 years
    Oops. I see that now. FWIW, I'm able to recreate your experience/frustration using pyodbc on my databases. From my perspective, the path of least resistance is to hack on pandas so that read_frame can take a cursor object. Maybe I'll be able to get a PR together soon.
  • Paul H
    Paul H about 11 years
    well your cur.fetchall() technique covers that. i'll stop blabbering now.