What is an efficient way of inserting thousands of records into an SQLite table using Django?

64,757

Solution 1

You want to check out django.db.transaction.commit_manually.

http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually

So it would be something like:

from django.db import transaction

@transaction.commit_manually
def viewfunc(request):
    ...
    for item in items:
        entry = Entry(a1=item.a1, a2=item.a2)
        entry.save()
    transaction.commit()

Which will only commit once, instead at each save().

In django 1.3 context managers were introduced. So now you can use transaction.commit_on_success() in a similar way:

from django.db import transaction

def viewfunc(request):
    ...
    with transaction.commit_on_success():
        for item in items:
            entry = Entry(a1=item.a1, a2=item.a2)
            entry.save()

In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.

NOTE the save method will not be called when using bulk create.

>>> Entry.objects.bulk_create([
...     Entry(headline="Django 1.0 Released"),
...     Entry(headline="Django 1.1 Announced"),
...     Entry(headline="Breaking: Django is awesome")
... ])

In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.

from the django documentation on atomic:

atomic is usable both as a decorator:

from django.db import transaction

@transaction.atomic
def viewfunc(request):
    # This code executes inside a transaction.
    do_stuff()

and as a context manager:

from django.db import transaction

def viewfunc(request):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # This code executes inside a transaction.
        do_more_stuff()

Solution 2

Bulk creation is available in Django 1.4:

https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create

Solution 3

To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."

The quoted stuff is from the docs--- A-IV provided a link.

What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.

Solution 4

Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.

Solution 5

You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.

Share:
64,757
Admin
Author by

Admin

Updated on July 05, 2022

Comments

  • Admin
    Admin almost 2 years

    I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
    At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
    Example:

    for item in items:
        entry = Entry(a1=item.a1, a2=item.a2)
        entry.save()
    

    What is an efficient way of doing this?

    Edit: A little comparison between the two insertion methods.

    Without commit_manually decorator (11245 records):

    nox@noxdevel marinetraffic]$ time python manage.py insrec             
    
    real    1m50.288s
    user    0m6.710s
    sys     0m23.445s
    

    Using commit_manually decorator (11245 records):

    [nox@noxdevel marinetraffic]$ time python manage.py insrec                
    
    real    0m18.464s
    user    0m5.433s
    sys     0m10.163s
    

    Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.

  • Glenn Maynard
    Glenn Maynard almost 15 years
    This will instantiate them all as models, and run thousands of individual inserts. I've always had to drop to SQL and do manual batch inserts for this type of volume; Django isn't built for it. But yes, you definitely want a single transaction if you're doing it this way.
  • Admin
    Admin almost 14 years
    Hi could you please elaborate the same in terms of .net? It would be a great help , as i am facing the same situation
  • user2471801
    user2471801 almost 14 years
    I don't have .net experience, but speaking from a general Database perspective, turn off AUTOCOMMIT and encapsulating INSERT statements between BEGIN/END TRANSACTION statements will be faster than using AUTOCOMMIT and running INSERTS alone. Note, these commands and how they are used can change based on the DB your using. If you want a .net or .net framework specific answer go ahead and start a new question.
  • Weholt
    Weholt over 13 years
    Another thing; If you decide to use plain SQL and if the SQL you`re inserting has the same fields each time, try using cursor.executemany(SQL, [list of entries to insert]). Much faster than running an insert per entry.
  • Ben Regenspan
    Ben Regenspan about 12 years
    Now that Django 1.4 is out, using docs.djangoproject.com/en/dev/ref/models/querysets/… makes a lot more sense. The other fast alternative is to manually create a batch SQL insert. The tip here (committing in one transaction) will not be nearly as fast as sending in one insert.
  • Ezekiel Kruglick
    Ezekiel Kruglick over 8 years
    As of 1.9 bulk_create is working great. Note that you'll need to break up creation into batches with no more than 999 total added properties for SQLite.
  • Dejell
    Dejell over 7 years
    does it work the same way that save works? e.g. save or update each object?
  • Marc Laugharn
    Marc Laugharn over 5 years
    transaction.commit_manually was removed in 1.8 docs.djangoproject.com/en/dev/internals/deprecation
  • Mohit Mishra
    Mohit Mishra about 5 years
    This code is running properly. I'm checked on my level. if any error occurring from your side so please check code again and understand what's meaning of this code and then try it again. and anybody knows better and easy way of inserting multiple data in a one time in the database, please share with us. Thank You
  • Joey
    Joey about 5 years
    Please include clarification, further explanation etc. directly into your answer instead of using comments. Comments should be used for asking for more information or for suggesting improvements.
  • Gabriel
    Gabriel about 5 years
    Code-only answers like yours are discouraged.
  • Owen
    Owen over 4 years
    Worth noting that transaction.atomic will not make the code run any faster. Otherwise, excellent summary, thanks.
  • AlxVallejo
    AlxVallejo almost 4 years
    @MarcLaugharn Well then wtf!
  • user2471801
    user2471801 almost 4 years
    wow, this answer is 11 years old... Maybe it's about time to remove the 1.X references...
  • mic
    mic over 3 years
    The Django snippet should be unnecessary with Django ≥1.5, right? Since Django 1.5, there is a batch_size parameter that you can use: "The batch_size parameter controls how many objects are created in single query. The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used. The batch_size parameter was added in version 1.5."