What is an efficient way of inserting thousands of records into an SQLite table using Django?
Solution 1
You want to check out django.db.transaction.commit_manually
.
http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually
So it would be something like:
from django.db import transaction
@transaction.commit_manually
def viewfunc(request):
...
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
transaction.commit()
Which will only commit once, instead at each save().
In django 1.3 context managers were introduced. So now you can use transaction.commit_on_success() in a similar way:
from django.db import transaction
def viewfunc(request):
...
with transaction.commit_on_success():
for item in items:
entry = Entry(a1=item.a1, a2=item.a2)
entry.save()
In django 1.4, bulk_create
was added, allowing you to create lists of your model objects and then commit them all at once.
NOTE the save method will not be called when using bulk create.
>>> Entry.objects.bulk_create([
... Entry(headline="Django 1.0 Released"),
... Entry(headline="Django 1.1 Announced"),
... Entry(headline="Breaking: Django is awesome")
... ])
In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success
and commit_manually
.
from the django documentation on atomic:
atomic is usable both as a decorator:
from django.db import transaction
@transaction.atomic
def viewfunc(request):
# This code executes inside a transaction.
do_stuff()
and as a context manager:
from django.db import transaction
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
Solution 2
Bulk creation is available in Django 1.4:
https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create
Solution 3
To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."
The quoted stuff is from the docs--- A-IV provided a link.
What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.
Solution 4
Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.
Solution 5
You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.
Admin
Updated on July 05, 2022Comments
-
Admin almost 2 years
I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:for item in items: entry = Entry(a1=item.a1, a2=item.a2) entry.save()
What is an efficient way of doing this?
Edit: A little comparison between the two insertion methods.
Without commit_manually decorator (11245 records):
nox@noxdevel marinetraffic]$ time python manage.py insrec real 1m50.288s user 0m6.710s sys 0m23.445s
Using commit_manually decorator (11245 records):
[nox@noxdevel marinetraffic]$ time python manage.py insrec real 0m18.464s user 0m5.433s sys 0m10.163s
Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.
-
Glenn Maynard almost 15 yearsThis will instantiate them all as models, and run thousands of individual inserts. I've always had to drop to SQL and do manual batch inserts for this type of volume; Django isn't built for it. But yes, you definitely want a single transaction if you're doing it this way.
-
Admin almost 14 yearsHi could you please elaborate the same in terms of .net? It would be a great help , as i am facing the same situation
-
user2471801 almost 14 yearsI don't have .net experience, but speaking from a general Database perspective, turn off AUTOCOMMIT and encapsulating INSERT statements between BEGIN/END TRANSACTION statements will be faster than using AUTOCOMMIT and running INSERTS alone. Note, these commands and how they are used can change based on the DB your using. If you want a .net or .net framework specific answer go ahead and start a new question.
-
Weholt over 13 yearsAnother thing; If you decide to use plain SQL and if the SQL you`re inserting has the same fields each time, try using cursor.executemany(SQL, [list of entries to insert]). Much faster than running an insert per entry.
-
Ben Regenspan about 12 yearsNow that Django 1.4 is out, using docs.djangoproject.com/en/dev/ref/models/querysets/… makes a lot more sense. The other fast alternative is to manually create a batch SQL insert. The tip here (committing in one transaction) will not be nearly as fast as sending in one insert.
-
Ezekiel Kruglick over 8 yearsAs of 1.9 bulk_create is working great. Note that you'll need to break up creation into batches with no more than 999 total added properties for SQLite.
-
Dejell over 7 yearsdoes it work the same way that save works? e.g. save or update each object?
-
Marc Laugharn over 5 yearstransaction.commit_manually was removed in 1.8 docs.djangoproject.com/en/dev/internals/deprecation
-
Mohit Mishra about 5 yearsThis code is running properly. I'm checked on my level. if any error occurring from your side so please check code again and understand what's meaning of this code and then try it again. and anybody knows better and easy way of inserting multiple data in a one time in the database, please share with us. Thank You
-
Joey about 5 yearsPlease include clarification, further explanation etc. directly into your answer instead of using comments. Comments should be used for asking for more information or for suggesting improvements.
-
Gabriel about 5 yearsCode-only answers like yours are discouraged.
-
Owen over 4 yearsWorth noting that
transaction.atomic
will not make the code run any faster. Otherwise, excellent summary, thanks. -
AlxVallejo almost 4 years@MarcLaugharn Well then wtf!
-
user2471801 almost 4 yearswow, this answer is 11 years old... Maybe it's about time to remove the 1.X references...
-
mic over 3 yearsThe Django snippet should be unnecessary with Django ≥1.5, right? Since Django 1.5, there is a
batch_size
parameter that you can use: "The batch_size parameter controls how many objects are created in single query. The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used. The batch_size parameter was added in version 1.5."