Using multiple cursors in a nested loop in sqlite3 from python-2.7
Solution 1
You could build up a list of rows to insert in the inner loop and then cursor.executemany() outside the loop. This doesn't answer the multiple cursor question but may be a workaround for you.
curOuter = db.cursor()
rows=[]
for row in curOuter.execute('SELECT * FROM myConnections'):
id = row[0]
scList = retrieve_shared_connections(id)
for sc in scList:
rows.append((id,sc))
curOuter.executemany('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', rows)
db.commit()
Better yet only select the ID from myConnections:
curOuter.execute('SELECT id FROM myConnections')
Solution 2
This looks like you are hitting issue 10513, fixed in Python 2.7.13, 3.5.3 and 3.6.0b1.
There was a bug in the way transactions were handled, where all cursor states were reset in certain circumstances. This led to curOuter
starting from the beginning again.
The work-around is to upgrade, or until you can upgrade, to not use cursors across transaction commits. By using curOuter.fetchall()
you achieved the latter.
Solution 3
While building an in-memory list seems to be best solution, I've found that using explicit transactions reduces the number duplicates returned in the outer query. That would make it something like:
with db:
curOuter = db.cursor()
for row in curOuter.execute('SELECT * FROM myConnections'):
id = row[0]
with db:
curInner = db.cursor()
scList = retrieve_shared_connections(id)
for sc in scList:
curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc))
Solution 4
This is a bit older, I see. But when stumbling upon this question, I wondered, whether sqlite3 still has such issues in python-2.7. Let's see:
#!/usr/bin/python
import sqlite3
import argparse
from datetime import datetime
DBFILE = 'nested.sqlite'
MAX_A = 1000
MAX_B = 10000
parser = argparse.ArgumentParser(description='Nested SQLite cursors in Python')
parser.add_argument('step', type=int)
args = parser.parse_args()
connection = sqlite3.connect(DBFILE)
connection.row_factory = sqlite3.Row
t0 = datetime.now()
if args.step == 0:
# set up test database
cursor = connection.cursor()
cursor.execute("""DROP TABLE IF EXISTS A""")
cursor.execute("""DROP TABLE IF EXISTS B""")
# intentionally omitting primary keys
cursor.execute("""CREATE TABLE A ( K INTEGER )""")
cursor.execute("""CREATE TABLE B ( K INTEGER, L INTEGER )""")
cursor.executemany("""INSERT INTO A ( K ) VALUES ( ? )""",
[ (i,) for i in range(0, MAX_A) ])
connection.commit()
for row in cursor.execute("""SELECT COUNT(*) CNT FROM A"""):
print row['CNT']
if args.step == 1:
# do the nested SELECT and INSERT
read = connection.cursor()
write = connection.cursor()
for row in read.execute("""SELECT * FROM A"""):
bs = [ ( row['K'], i ) for i in range(0, MAX_B) ]
for b in bs: # with .executemany() it would be twice as fast ;)
write.execute("""INSERT INTO B ( K, L ) VALUES ( ?, ? )""", b)
connection.commit()
for row in connection.cursor().execute("""SELECT COUNT(*) CNT FROM B"""):
print row['CNT']
elif args.step == 2:
connection = sqlite3.connect(DBFILE)
connection.row_factory = sqlite3.Row
control = connection.cursor()
ca = cb = 0 # will count along our expectation
for row in control.execute("""SELECT * FROM B ORDER BY K ASC, L ASC"""):
assert row['K'] == ca and row['L'] == cb
cb += 1
if cb == MAX_B:
cb = 0
ca += 1
assert ca == MAX_A and cb == 0
for row in connection.cursor().execute("""SELECT COUNT(*) CNT FROM B"""):
print row['CNT']
print datetime.now() - t0
Output is
$ ./nested.py 0
1000
0:00:04.465695
$ ./nested.py 1
10000000
0:00:27.726074
$ ./nested.py 2
10000000
0:00:19.137563
This test was done using
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2
>>> import sqlite3
>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.8.2'
The situation changes when we commit
in packages, e.g. by indenting the connection.commit()
in step 1 of the above test script. The behavior is quite strange, because only the second commit
to the write
cursor resets the read
cursor, exactly as shown in the OP. After fiddling with the code above, I assume that OP did not do one commit
as shown in the example code, but did commit
in packages.
Remark: Drawing the cursors read
and write
from separate connections to support packaged commit
, as suggested in an answer to another question, does not work because the commit
s will run against a foreign lock.
Related videos on Youtube
![tjim](https://i.stack.imgur.com/9nxKO.jpg?s=256&g=1)
tjim
Updated on September 06, 2020Comments
-
tjim almost 4 years
I’ve been having problems using multiple cursors on a single sqlite database within a nested loop. I found a solution that works for me, but it’s limited and I haven’t seen this specific problem documented online. I’m posting this so:
- A clear problem/solution is available
- To see if there’s a better solution
- Perhaps I’ve found a defect in the
sqlite3
python module
My Python app is storing social relationship data in sqlite. The dataset includes a one-to-many relationship between two tables: myConnections and sharedConnections. The former has one row for each connection. The sharedConnections table has 0:N rows, depending on how many connections are shared. To build the structure, I use a nested loop. In the outside loop I visit each row in myConnections. In the inside loop, I populate the sharedConnections table. The code looks like this:
curOuter = db.cursor() for row in curOuter.execute('SELECT * FROM myConnections'): id = row[0] curInner = db.cursor() scList = retrieve_shared_connections(id) for sc in scList: curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc)) db.commit()
The result is odd. The
sharedConnections
table gets duplicate entries for the first two records inmyConnections
. They’re a bit collated. A’s connections, B’s connections, followed by A and then B again. After the initial stutter, the processing is correct! Example:myConnections ------------- a b c d sharedConnections ------------- a->b a->c b->c b->d a->b a->c b->c b->d
The solution is imperfect. Instead of using the iterator from the outside loop cursor, I
SELECT
, thenfetchall()
and loop through the resulting list. Since my dataset is pretty small, this is OK.curOuter = db.cursor() curOuter.execute('SELECT * FROM myConnections'): rows = curOuter.fetchall() for row in rows: id = row[0] curInner = db.cursor() scList = retrieve_shared_connections(id) for sc in scList: curInner.execute('''INSERT INTO sharedConnections(IdConnectedToMe, IdShared) VALUES (?,?)''', (id,sc)) db.commit()
There you have it. Using two cursors against different tables in the same sqlite database within a nested loop doesn’t seem to work. What’s more, it doesn’t fail, it just gives odd results.
- Is this truly the best solution?
- Is there a better solution?
- Is this a defect that should be addressed?
-
Wirsing over 11 yearsWhat does
retrieve_shared_connections()
do? Does it affect the DB in any way? -
tjim over 11 yearsretrieve_shared_connections(id) does not involve the database. It's a function that uses a webservice to return a list of shared connections, given an id. The loop immediate below that call INSERTs each shared connection into the database.
-
Iguananaut over 11 yearsI haven't looked too closely at your code yet, but would an
INSERT INTO ... SELECT FROM
statement work? INSERT statements in SQLite do allow the values to be culled from a SELECT statement. -
tjim over 11 years@iguananaut you are correct that SELECT can feed an INSERT statement. Unfortunately, the sharedConnections info isn't in the database, it's in the cloud. The goal is to get the sharedConnections via the webservice and INSERT them into the database.
-
Iguananaut over 11 yearsAh I see what you're saying now. It's been a while since I've used SQLite so maybe someone with more recent experience can comment. I would have thought that because you're selecting/updating from different tables it shouldn't matter. But it appears that the act of doing inserts is confusing the generator method for your outer cursor. So using fetchall() is probably a good bet for now to get around that. However, it looks like you're only using the
id
column frommyConnections
so you can save a lot by usingSELECT id from myConnections
instead of all columns. -
tjim over 11 yearsthanks @iguananaught. That code was refactored so many times working through the cursor problem that I lost sight of the SELECT *.
-
Doo Dah about 11 yearsWow. Really? That is really lame. Does anyone know if this is a limitation of the sqlite or the python interface to it? In my case, the outter loop has ~6 million rows. I can't pull it all into memory. I can come up with some sort of work around. Perhaps an enterprise DB is the way to go (SQL Server, Postgres, MySQL, etc).
-
tjim over 11 yearsThanks @user1451298. A previous comment suggested replacing * with Id. Agreed. My first thought on your restructuring was "6 of one; 1/2 dozen of the other." But, after thinking through your solution it seems more extensible because the rows can be spooled to disk if memory becomes an issue - thanks!
-
Anov over 11 yearsoh, didn't see that. Yeah, not sure if building up a cache of rows to insert is any better than pulling all the ids with fetchall()...