Using multiple cursors in a nested loop in sqlite3 from python-2.7

20,783

Solution 1

You could build up a list of rows to insert in the inner loop and then cursor.executemany() outside the loop. This doesn't answer the multiple cursor question but may be a workaround for you.

curOuter = db.cursor()
rows=[]
for row in curOuter.execute('SELECT * FROM myConnections'):    
    id  = row[0]    
    scList = retrieve_shared_connections(id)  
    for sc in scList:

        rows.append((id,sc))
curOuter.executemany('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', rows)  
db.commit()

Better yet only select the ID from myConnections:

curOuter.execute('SELECT id FROM myConnections')

Solution 2

This looks like you are hitting issue 10513, fixed in Python 2.7.13, 3.5.3 and 3.6.0b1.

There was a bug in the way transactions were handled, where all cursor states were reset in certain circumstances. This led to curOuter starting from the beginning again.

The work-around is to upgrade, or until you can upgrade, to not use cursors across transaction commits. By using curOuter.fetchall() you achieved the latter.

Solution 3

While building an in-memory list seems to be best solution, I've found that using explicit transactions reduces the number duplicates returned in the outer query. That would make it something like:

with db:
    curOuter = db.cursor()
    for row in curOuter.execute('SELECT * FROM myConnections'):    
        id  = row[0]
        with db:
            curInner = db.cursor()  
            scList = retrieve_shared_connections(id)  
            for sc in scList:  
                curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc))

Solution 4

This is a bit older, I see. But when stumbling upon this question, I wondered, whether sqlite3 still has such issues in python-2.7. Let's see:

#!/usr/bin/python
import sqlite3
import argparse
from datetime import datetime

DBFILE = 'nested.sqlite'
MAX_A = 1000
MAX_B = 10000

parser = argparse.ArgumentParser(description='Nested SQLite cursors in Python')
parser.add_argument('step', type=int)
args = parser.parse_args()

connection = sqlite3.connect(DBFILE)
connection.row_factory = sqlite3.Row
t0 = datetime.now()

if args.step == 0:
    # set up test database
    cursor = connection.cursor()
    cursor.execute("""DROP TABLE IF EXISTS A""")
    cursor.execute("""DROP TABLE IF EXISTS B""")
    # intentionally omitting primary keys
    cursor.execute("""CREATE TABLE A ( K INTEGER )""")
    cursor.execute("""CREATE TABLE B ( K INTEGER, L INTEGER )""")
    cursor.executemany("""INSERT INTO A ( K ) VALUES ( ? )""", 
        [ (i,) for i in range(0, MAX_A) ])
    connection.commit()
    for row in cursor.execute("""SELECT COUNT(*) CNT FROM A"""):
        print row['CNT']

if args.step == 1:
    # do the nested SELECT and INSERT
    read = connection.cursor()
    write = connection.cursor()
    for row in read.execute("""SELECT * FROM A"""):
        bs = [ ( row['K'], i ) for i in range(0, MAX_B) ]
        for b in bs: # with .executemany() it would be twice as fast ;)
            write.execute("""INSERT INTO B ( K, L ) VALUES ( ?, ? )""", b)
    connection.commit()
    for row in connection.cursor().execute("""SELECT COUNT(*) CNT FROM B"""):
        print row['CNT']

elif args.step == 2:
    connection = sqlite3.connect(DBFILE)
    connection.row_factory = sqlite3.Row
    control = connection.cursor()
    ca = cb = 0 # will count along our expectation
    for row in control.execute("""SELECT * FROM B ORDER BY K ASC, L ASC"""):
        assert row['K'] == ca and row['L'] == cb
        cb += 1
        if cb == MAX_B:
            cb = 0
            ca += 1
    assert ca == MAX_A and cb == 0
    for row in connection.cursor().execute("""SELECT COUNT(*) CNT FROM B"""):
        print row['CNT']

print datetime.now() - t0

Output is

$ ./nested.py 0
1000
0:00:04.465695
$ ./nested.py 1
10000000
0:00:27.726074
$ ./nested.py 2
10000000
0:00:19.137563

This test was done using

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2
>>> import sqlite3
>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.8.2'

The situation changes when we commit in packages, e.g. by indenting the connection.commit() in step 1 of the above test script. The behavior is quite strange, because only the second commit to the write cursor resets the read cursor, exactly as shown in the OP. After fiddling with the code above, I assume that OP did not do one commit as shown in the example code, but did commit in packages.

Remark: Drawing the cursors read and write from separate connections to support packaged commit, as suggested in an answer to another question, does not work because the commits will run against a foreign lock.

Share:
20,783

Related videos on Youtube

tjim
Author by

tjim

Updated on September 06, 2020

Comments

  • tjim
    tjim almost 4 years

    I’ve been having problems using multiple cursors on a single sqlite database within a nested loop. I found a solution that works for me, but it’s limited and I haven’t seen this specific problem documented online. I’m posting this so:

    • A clear problem/solution is available
    • To see if there’s a better solution
    • Perhaps I’ve found a defect in the sqlite3 python module

    My Python app is storing social relationship data in sqlite. The dataset includes a one-to-many relationship between two tables: myConnections and sharedConnections. The former has one row for each connection. The sharedConnections table has 0:N rows, depending on how many connections are shared. To build the structure, I use a nested loop. In the outside loop I visit each row in myConnections. In the inside loop, I populate the sharedConnections table. The code looks like this:

    curOuter = db.cursor()  
    for row in curOuter.execute('SELECT * FROM myConnections'):    
        id  = row[0]  
        curInner = db.cursor()  
        scList = retrieve_shared_connections(id)  
        for sc in scList:  
            curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc))  
    db.commit()  
    

    The result is odd. The sharedConnections table gets duplicate entries for the first two records in myConnections. They’re a bit collated. A’s connections, B’s connections, followed by A and then B again. After the initial stutter, the processing is correct! Example:

    myConnections
    -------------
    a   
    b  
    c  
    d  
    
    sharedConnections
    -------------
    a->b  
    a->c  
    b->c  
    b->d  
    a->b  
    a->c  
    b->c  
    b->d  
    

    The solution is imperfect. Instead of using the iterator from the outside loop cursor, I SELECT, then fetchall() and loop through the resulting list. Since my dataset is pretty small, this is OK.

    curOuter = db.cursor()
    curOuter.execute('SELECT * FROM myConnections'):
    rows = curOuter.fetchall()
    for row in rows:    
        id  = row[0]
        curInner = db.cursor()
        scList = retrieve_shared_connections(id)
        for sc in scList:
            curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc))
    db.commit()
    

    There you have it. Using two cursors against different tables in the same sqlite database within a nested loop doesn’t seem to work. What’s more, it doesn’t fail, it just gives odd results.

    • Is this truly the best solution?
    • Is there a better solution?
    • Is this a defect that should be addressed?
    • Wirsing
      Wirsing over 11 years
      What does retrieve_shared_connections() do? Does it affect the DB in any way?
    • tjim
      tjim over 11 years
      retrieve_shared_connections(id) does not involve the database. It's a function that uses a webservice to return a list of shared connections, given an id. The loop immediate below that call INSERTs each shared connection into the database.
    • Iguananaut
      Iguananaut over 11 years
      I haven't looked too closely at your code yet, but would an INSERT INTO ... SELECT FROM statement work? INSERT statements in SQLite do allow the values to be culled from a SELECT statement.
    • tjim
      tjim over 11 years
      @iguananaut you are correct that SELECT can feed an INSERT statement. Unfortunately, the sharedConnections info isn't in the database, it's in the cloud. The goal is to get the sharedConnections via the webservice and INSERT them into the database.
    • Iguananaut
      Iguananaut over 11 years
      Ah I see what you're saying now. It's been a while since I've used SQLite so maybe someone with more recent experience can comment. I would have thought that because you're selecting/updating from different tables it shouldn't matter. But it appears that the act of doing inserts is confusing the generator method for your outer cursor. So using fetchall() is probably a good bet for now to get around that. However, it looks like you're only using the id column from myConnections so you can save a lot by using SELECT id from myConnections instead of all columns.
    • tjim
      tjim over 11 years
      thanks @iguananaught. That code was refactored so many times working through the cursor problem that I lost sight of the SELECT *.
    • Doo Dah
      Doo Dah about 11 years
      Wow. Really? That is really lame. Does anyone know if this is a limitation of the sqlite or the python interface to it? In my case, the outter loop has ~6 million rows. I can't pull it all into memory. I can come up with some sort of work around. Perhaps an enterprise DB is the way to go (SQL Server, Postgres, MySQL, etc).
  • tjim
    tjim over 11 years
    Thanks @user1451298. A previous comment suggested replacing * with Id. Agreed. My first thought on your restructuring was "6 of one; 1/2 dozen of the other." But, after thinking through your solution it seems more extensible because the rows can be spooled to disk if memory becomes an issue - thanks!
  • Anov
    Anov over 11 years
    oh, didn't see that. Yeah, not sure if building up a cache of rows to insert is any better than pulling all the ids with fetchall()...