Simple Subquery with OuterRef

27,809

Solution 1

One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.

So one may think that the right approach would be to use Count() instead. Maybe something like this:

Post.objects.annotate(
    count=Count(Tag.objects.filter(post=OuterRef('pk')))
)

This won't work for two reasons:

  1. The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).

  2. Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.

Applying fixes for 1) and 2) would produce:

Post.objects.annotate(
    count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)

However if you inspect the query being produced:

SELECT 
    "tests_post"."id",
    "tests_post"."title",
    COUNT((SELECT U0."id" 
            FROM "tests_tag" U0 
            INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id") 
            WHERE U1."post_id" = ("tests_post"."id"))
    ) AS "count" 
FROM "tests_post" 
GROUP BY 
    "tests_post"."id",
    "tests_post"."title"

you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :

Post.objects.annotate(
    count=Subquery(
        Tag.objects
            .filter(post=OuterRef('pk'))
            # The first .values call defines our GROUP BY clause
            # Its important to have a filtration on every field defined here
            # Otherwise you will have more than one group per row!!!
            # This will lead to subqueries to return more than one row!
            # But they are not allowed to do that!
            # In our example we group only by post
            # and we filter by post via OuterRef
            .values('post')
            # Here we say: count how many rows we have per group 
            .annotate(count=Count('pk'))
            # Here we say: return only the count
            .values('count')
    )
)

Finally this will produce:

SELECT 
    "tests_post"."id",
    "tests_post"."title",
    (SELECT COUNT(U0."id") AS "count" 
            FROM "tests_tag" U0 
            INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id") 
            WHERE U1."post_id" = ("tests_post"."id") 
            GROUP BY U1."post_id"
    ) AS "count" 
FROM "tests_post"

Solution 2

The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:

from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
            tag_count=SubqueryCount('tag'))

The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.

Share:
27,809
mjuk
Author by

mjuk

Updated on July 09, 2022

Comments

  • mjuk
    mjuk almost 2 years

    I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.

    posts/models.py code

    from django.db import models
    
    class Tag(models.Model):
        name = models.CharField(max_length=120)
        def __str__(self):
            return self.name
    
    class Post(models.Model):
        title = models.CharField(max_length=120)
        tags = models.ManyToManyField(Tag)
        def __str__(self):
            return self.title
    

    manage.py shell code

    >>> from django.db.models import OuterRef, Subquery
    >>> from posts.models import Tag, Post
    >>> tag1 = Tag.objects.create(name='tag1')
    >>> post1 = Post.objects.create(title='post1')
    >>> post1.tags.add(tag1)
    >>> Tag.objects.filter(post=post1.pk)
    <QuerySet [<Tag: tag1>]>
    >>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
    >>> Post.objects.annotate(count=Subquery(tags_list.count()))
    

    The last two lines should give me number of tags for each Post object. And here I keep getting the same error:

    ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
    
  • mjuk
    mjuk about 7 years
    Thanks, that worked! However, when I add pk__in=[1,2] to the Tag filter, I get django.core.exceptions.FieldError: Expression contains mixed types. You must set output_field.
  • Todor
    Todor about 7 years
    You can try to print the queryset.query and execute it in your RDBMS directly to see what you get in return. I guess that for some rows Count may return NULL instead of 0. You can try to confirm that by temporary exclude rows w/o count, i.e. .filter(count__gte=1). However, Subquery accepts a second argument, which is output_field you can try set it to: output_field=fields.IntegerField()
  • sajid
    sajid about 6 years
    @Todor Does not work for me. I get django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression
  • Todor
    Todor about 6 years
    I guess you have some filter missing, so you got more than one group per row which returns more than one count. You must have a filter on every field in your first .values() clause in the subquery.
  • Admin
    Admin over 3 years
    having a default ordering set on Tag will make the GROUP BY include also the fields defined in the ordering in addition the ones in values(), which generates a Subquery returns more than 1 row error. fix by adding ).order_by('post').values('post')