What's the best way to count results in GQL?

24,914

Solution 1

+1 to Jehiah's response.

Official and blessed method on getting object counters on GAE is to build sharded counter. Despite heavily sounding name, this is pretty straightforward.

Solution 2

You have to flip your thinking when working with a scalable datastore like GAE to do your calculations up front. In this case that means you need to keep counters for each baz and increment them whenever you add a new bar, instead of counting at the time of display.

class CategoryCounter(db.Model):
    category = db.StringProperty()
    count = db.IntegerProperty(default=0)

then when creating a Bar object, increment the counter

def createNewBar(category_name):
  bar = Bar(...,baz=category_name)

  counter = CategoryCounter.filter('category =',category_name).get()
  if not counter:
    counter = CategoryCounter(category=category_name)
  else:
    counter.count += 1
  bar.put()
  counter.put()

db.run_in_transaction(createNewBar,'asdf')

now you have an easy way to get the count for any specific category

CategoryCounter.filter('category =',category_name).get().count

Solution 3

Count functions in all databases are slow (eg, O(n)) - the GAE datastore just makes that more obvious. As Jehiah suggests, you need to store the computed count in an entity and refer to that if you want scalability.

This isn't unique to App Engine - other databases just hide it better, up until the point where you're trying to count tens of thousands of records with each request, and your page render time starts to increase exponentially...

Solution 4

According to the GqlQuery.count() documentation, you can set the limit to be some number greater than 1000:

from models import Troll
troll_count = Troll.all(keys_only=True).count(limit=31337)

Sharded counters are the right way to keep track of numbers like this, as folks have said, but if you figure this out late in the game (like me) then you'll need to initialize the counters from an actual count of objects. But this is a great way to burn through your free quota of Datastore Small Operations (50,000 I think). Every time you run the code, it will use up as many ops as there are model objects.

Share:
24,914
barneytron
Author by

barneytron

I'm a C# ASP.NET developer who dreams of making a living doing Ruby and Python instead. Nevertheless, I am rather grateful that I am making a decent living working with em dollar technology.

Updated on August 01, 2022

Comments

  • barneytron
    barneytron almost 2 years

    I figure one way to do a count is like this:

    foo = db.GqlQuery("SELECT * FROM bar WHERE baz = 'baz')
    my_count = foo.count()
    

    What I don't like is my count will be limited to 1000 max and my query will probably be slow. Anyone out there with a workaround? I have one in mind, but it doesn't feel clean. If only GQL had a real COUNT Function...

  • barneytron
    barneytron over 15 years
    I've thought about that. Resource hog is right, but it would probably work.
  • barneytron
    barneytron over 15 years
    I see that there are 2 .put() calls on 2 different entities in your trans. If I remember, I got an error telling me I could not update two root entities in the same trans when I tried something similar before. Maybe put both (bar & counter) in the same group? I'll try again when I have more time.
  • barneytron
    barneytron over 15 years
    I confirmed the error I got doing the two .put() calls in a transaction: "Cannot operate on different entity groups in a transaction" I still like the idea of using two entities though.
  • Luke
    Luke about 15 years
    Losing the ability to track counters on a per-user basis makes it a lot harder to weed out spammers though. How can you tackle this issue when using sharded counters?
  • amir ntm
    amir ntm over 13 years
    Shouldn't the default counter value be 1?
  • me22
    me22 over 13 years
    It's not always slow -- there are some that maintain the sharded counter for you, expecting queries like SELECT COUNT(*) FROM MyTable. (IIRC, MyISAM in MySQL does that.) And your page render times would only increate linearly, not exponentially. If it were really exponential, you'd see it after hundreds of records, and not have to wait for tens of thousands.
  • me22
    me22 over 13 years
    Offset doesn't work past 1000 limit because doing that means you're using a en.wikipedia.org/wiki/Schlemiel_the_painter%27s_Algorithm . And while using .filter('foo >', bar) is a good way to page through all the results, there's no need to add a pointless field to your record, since they're ordered on key already. Just grab the last key you looked at, and start from (after) there.
  • Praxiteles
    Praxiteles about 6 years
    How do you access those properties? We tried query_result.count but it was empty
  • Jonathan B.
    Jonathan B. about 6 years
    @Praxiteles, what library or language are you running your code from? I'm using the .NET / C# library. I'll update my answer with the code I'm using.