How do I force Django to ignore any caches and reload data?

41,437

Solution 1

Having had this problem and found two definitive solutions for it I thought it worth posting another answer.

This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.

Demonstrate like this

Run a django shell in terminal 1

>>> MyModel.objects.get(id=1).my_field
u'old'

And another in terminal 2

>>> MyModel.objects.get(id=1).my_field
u'old'
>>> a = MyModel.objects.get(id=1)
>>> a.my_field = "NEW"
>>> a.save()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>> 

Back to terminal 1 to demonstrate the problem - we still read the old value from the database.

>>> MyModel.objects.get(id=1).my_field
u'old'

Now in terminal 1 demonstrate the solution

>>> from django.db import transaction
>>> 
>>> @transaction.commit_manually
... def flush_transaction():
...     transaction.commit()
... 
>>> MyModel.objects.get(id=1).my_field
u'old'
>>> flush_transaction()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>> 

The new data is now read

Here is that code in an easy to paste block with docstring

from django.db import transaction

@transaction.commit_manually
def flush_transaction():
    """
    Flush the current transaction so we don't read stale data

    Use in long running processes to make sure fresh data is read from
    the database.  This is a problem with MySQL and the default
    transaction mode.  You can fix it by setting
    "transaction-isolation = READ-COMMITTED" in my.cnf or by calling
    this function at the appropriate moment
    """
    transaction.commit()

The alternative solution is to change my.cnf for MySQL to change the default transaction mode

transaction-isolation = READ-COMMITTED

Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.

Update 3 years later

Now that Django 1.6 has turned on autocommit in MySQL this is no longer a problem. The example above now works fine without the flush_transaction() code whether your MySQL is in REPEATABLE-READ (the default) or READ-COMMITTED transaction isolation mode.

What was happening in previous versions of Django which ran in non autocommit mode was that the first select statement opened a transaction. Since MySQL's default mode is REPEATABLE-READ this means that no updates to the database will be read by subsequent select statements - hence the need for the flush_transaction() code above which stops the transaction and starts a new one.

There are still reasons why you might want to use READ-COMMITTED transaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED.

The flush_transaction() code now produces a deprecation warning in Django 1.6 so I recommend you remove it.

Solution 2

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.

we used the @transaction.commit_manually decorator and calls to transaction.commit() just before every occasion where you need up-to-date info.

As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.

detailed info here:

http://devblog.resolversystems.com/?p=439

Solution 3

I'm not sure I'd recommend it...but you can just kill the cache yourself:

>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> qs._result_cache = None
>>> qs.count()
2

And here's a better technique that doesn't rely on fiddling with the innards of the QuerySet: Remember that the caching is happening within a QuerySet, but refreshing the data simply requires the underlying Query to be re-executed. The QuerySet is really just a high-level API wrapping a Query object, plus a container (with caching!) for Query results. Thus, given a queryset, here is a general-purpose way of forcing a refresh:

>>> MyModel().save()
>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> from django.db.models import QuerySet
>>> qs = QuerySet(model=MyModel, query=qs.query)
>>> qs.count()  # refreshed!
2
>>> party_time()

Pretty easy! You can of course implement this as a helper function and use as needed.

Solution 4

Seems like the count() goes to cache after the first time. This is the django source for QuerySet.count:

def count(self):
    """
    Performs a SELECT COUNT() and returns the number of records as an
    integer.

    If the QuerySet is already fully cached this simply returns the length
    of the cached results set to avoid multiple SELECT COUNT(*) calls.
    """
    if self._result_cache is not None and not self._iter:
        return len(self._result_cache)

    return self.query.get_count(using=self.db)

update does seem to be doing quite a bit of extra work, besides what you need.
But I can't think of any better way to do this, short of writing your own SQL for the count.
If performance is not super important, I would just do what you're doing, calling update before count.

QuerySet.update:

def update(self, **kwargs):
    """
    Updates all elements in the current QuerySet, setting all the given
    fields to the appropriate values.
    """
    assert self.query.can_filter(), \
            "Cannot update a query once a slice has been taken."
    self._for_write = True
    query = self.query.clone(sql.UpdateQuery)
    query.add_update_values(kwargs)
    if not transaction.is_managed(using=self.db):
        transaction.enter_transaction_management(using=self.db)
        forced_managed = True
    else:
        forced_managed = False
    try:
        rows = query.get_compiler(self.db).execute_sql(None)
        if forced_managed:
            transaction.commit(using=self.db)
        else:
            transaction.commit_unless_managed(using=self.db)
    finally:
        if forced_managed:
            transaction.leave_transaction_management(using=self.db)
    self._result_cache = None
    return rows
update.alters_data = True

Solution 5

If you append .all() to a queryset, it'll force a reread from the DB. Try MyModel.objects.all().count() instead of MyModel.objects.count().

Share:
41,437

Related videos on Youtube

scippy
Author by

scippy

Updated on July 05, 2022

Comments

  • scippy
    scippy almost 2 years

    I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.

    What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:

    >>> MyModel.objects.count()
    885
    # (Here I added some more data from another process.)
    >>> MyModel.objects.count()
    885
    >>> MyModel.objects.update()
    0
    >>> MyModel.objects.count()
    1025
    

    As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.

    I can't find any documentation on that update() method and have no idea what other bad things it might do.

    My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

  • scippy
    scippy almost 14 years
    That looks like an awesome and easy solution, but at least on my Django version, it doesn't work. Calling MyModel.objects._clone() results in a "AttributeError: 'Manager' object has no attribute '_clone'". I can do MyModel.objects.all()._clone(), but that works just as before -- doesn't change until I call update(). I'm using Django 1.2.1.
  • Travis Swicegood
    Travis Swicegood almost 14 years
    My bad - it should be MyModel.objects.all()._clone(). In thinking about it, you could get away with doing a MyModel.objects.all().count() without the _clone(). That creates a new version of the base object and should get you a new version w/o the cached value. That is, unless Django's doing something devious there and carrying the state with the clone.
  • Carl Meyer
    Carl Meyer over 12 years
    Most of this answer is relatively accurate, but the first sentence is highly misleading. This problem has nothing to do with any "cache" in Django or "forcing django to refresh", it's entirely about the database-level transaction isolation.
  • Carl Meyer
    Carl Meyer over 12 years
    This answer is wrong. Calling any method (like count()) on a manager implicitly clones a new queryset, there is no implicit caching behavior due to Manager identity, and no need to insert an extraneous call to _clone() or all(). This entire train of thought is a red herring, the OP's real problem is database-level transaction isolation, it has nothing to do with querysets or Django-level caching at all.
  • snozzwangler
    snozzwangler over 12 years
    DATABASE_OPTIONS = { "init_command": "SET storage_engine=INNODB, SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED", }
  • Travis Swicegood
    Travis Swicegood over 11 years
    Been a long time since I've toyed with this particular issue, but I imagine at the point this was created count() had some sort of caching in it, otherwise, Carl is correct and this answer is way off.
  • ryanbraganza
    ryanbraganza over 11 years
    As of django 1.2, the syntax of the settings has changed. Add "OPTIONS" to your DATABASES setting (probably the 'default' one) 'OPTIONS': { "init_command": "SET storage_engine=INNODB, SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED", }
  • Thomas Gak-Deluen
    Thomas Gak-Deluen about 9 years
    Well, I'm on Django 1.8 and the problem still persists, whether I use MySQL or SQLite
  • Jared
    Jared over 7 years
    I tried this on Django 1.10 and it doesn't work with sqlite or pg.
  • A. K. Tolentino
    A. K. Tolentino about 6 years
    this is actually more cleaner
  • Alexey Ruzin
    Alexey Ruzin about 6 years
    all() creates new object of QuerySet, which _result_cache is empty.
  • sparrowt
    sparrowt about 2 years
    To force count to hit the db without doing an update see this answer stackoverflow.com/a/38063511
  • sparrowt
    sparrowt about 2 years
    There's another simpler method in this answer stackoverflow.com/a/38063511