Is django prefetch_related supposed to work with GenericRelation

18,089

Solution 1

prefetch_related_objects to the rescue.

Starting from Django 1.10 (Note: it still presents in the previous versions, but was not part of the public API.), we can use prefetch_related_objects to divide and conquer our problem.

prefetch_related is an operation, where Django fetches related data after the queryset has been evaluated (doing a second query after the main one has been evaluated). And in order to work, it expects the items in the queryset to be homogeneous (the same type). The main reason the reverse generic generation does not work right now is that we have objects from different content types, and the code is not yet smart enough to separate the flow for different content types.

Now using prefetch_related_objects we do fetches only on a subset of our queryset where all the items will be homogeneous. Here is an example:

from django.db import models
from django.db.models.query import prefetch_related_objects
from django.core.paginator import Paginator
from django.contrib.contenttypes.models import ContentType
from tags.models import TaggedItem, Book, Movie


tagged_items = TaggedItem.objects.all()
paginator = Paginator(tagged_items, 25)
page = paginator.get_page(1)

# prefetch books with their author
# do this only for items where
# tagged_item.content_object is a Book
book_ct = ContentType.objects.get_for_model(Book)
tags_with_books = [item for item in page.object_list if item.content_type_id == book_ct.id]
prefetch_related_objects(tags_with_books, "content_object__author")

# prefetch movies with their director
# do this only for items where
# tagged_item.content_object is a Movie
movie_ct = ContentType.objects.get_for_model(Movie)
tags_with_movies = [item for item in page.object_list if item.content_type_id == movie_ct.id]
prefetch_related_objects(tags_with_movies, "content_object__director")

# This will make 5 queries in total
# 1 for page items
# 1 for books
# 1 for book authors
# 1 for movies
# 1 for movie directors
# Iterating over items wont make other queries
for item in page.object_list:
    # do something with item.content_object
    # and item.content_object.author/director
    print(
        item,
        item.content_object,
        getattr(item.content_object, 'author', None),
        getattr(item.content_object, 'director', None)
    )

Solution 2

If you want to retrieve Book instances and prefetch the related tags use Book.objects.prefetch_related('tags'). No need to use the reverse relation here.

You can also have a look at the related tests in the Django source code.

Also the Django documentation states that prefetch_related() is supposed to work with GenericForeignKey and GenericRelation:

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey.

UPDATE: To prefetch the content_object for a TaggedItem you can use TaggedItem.objects.all().prefetch_related('content_object'), if you want to limit the result to only tagged Book objects you could additionally filter for the ContentType (not sure if prefetch_related works with the related_query_name). If you also want to get the Author together with the book you need to use select_related() not prefetch_related() as this is a ForeignKey relationship, you can combine this in a custom prefetch_related() query:

from django.contrib.contenttypes.models import ContentType
from django.db.models import Prefetch

book_ct = ContentType.objects.get_for_model(Book)
TaggedItem.objects.filter(content_type=book_ct).prefetch_related(
    Prefetch(
        'content_object',  
        queryset=Book.objects.all().select_related('author')
    )
)

Solution 3

Building on Bernhard's answer, which has a code-snippet at the end that throws the below error in reality:

ValueError: Custom queryset can't be used for this lookup.

I've overridden the GenericForeignKey to actually allow the behavior, how bulletproof this implementation is, is unknown to me at this time but it seems to get what I need done, so I'm posting it here, hopefully it'll help out others. Please lookout for START CHANGES and END CHANGES tags to see my changes to the original django code.

from django.contrib.contenttypes.fields import GenericForeignKey as BaseGenericForeignKey

class CustomGenericForeignKey(BaseGenericForeignKey):
    def get_prefetch_queryset(self, instances, queryset=None):
        """
        Enable passing queryset to get_prefetch_queryset when using GenericForeignKeys but only works when a single
        content type is being queried
        """
        # START CHANGES
        # if queryset is not None:
        #     raise ValueError("Custom queryset can't be used for this lookup.")
        # END CHANGES

        # For efficiency, group the instances by content type and then do one
        # query per model
        fk_dict = defaultdict(set)
        # We need one instance for each group in order to get the right db:
        instance_dict = {}
        ct_attname = self.model._meta.get_field(self.ct_field).get_attname()
        for instance in instances:
            # We avoid looking for values if either ct_id or fkey value is None
            ct_id = getattr(instance, ct_attname)
            if ct_id is not None:
                fk_val = getattr(instance, self.fk_field)
                if fk_val is not None:
                    fk_dict[ct_id].add(fk_val)
                    instance_dict[ct_id] = instance

        ret_val = []
        for ct_id, fkeys in fk_dict.items():
            instance = instance_dict[ct_id]
            # START CHANGES
            if queryset is not None:
                assert len(fk_dict) == 1  # only a single content type is allowed, else undefined behavior
                ret_val.extend(queryset.filter(pk__in=fkeys))
            else:
                ct = self.get_content_type(id=ct_id, using=instance._state.db)
                ret_val.extend(ct.get_all_objects_for_this_type(pk__in=fkeys))
            # END CHANGES

        # For doing the join in Python, we have to match both the FK val and the
        # content type, so we use a callable that returns a (fk, class) pair.
        def gfk_key(obj):
            ct_id = getattr(obj, ct_attname)
            if ct_id is None:
                return None
            else:
                model = self.get_content_type(id=ct_id,
                                              using=obj._state.db).model_class()
                return (model._meta.pk.get_prep_value(getattr(obj, self.fk_field)),
                        model)

        return (
            ret_val,
            lambda obj: (obj.pk, obj.__class__),
            gfk_key,
            True,
            self.name,
            True,
        )
Share:
18,089
Todor
Author by

Todor

SOreadytohelp

Updated on June 11, 2022

Comments

  • Todor
    Todor almost 2 years

    UPDATE: An Open Ticked about this issue: 24272

    What's all about?

    Django has a GenericRelation class, which adds a “reverse” generic relationship to enable an additional API.

    It turns out we can use this reverse-generic-relation for filtering or ordering, but we can't use it inside prefetch_related.

    I was wondering if this is a bug, or its not supposed to work, or its something that can be implemented in the feature.

    Let me show you with some examples what I mean.

    Lets say we have two main models: Movies and Books.

    • Movies have a Director
    • Books have an Author

    And we want to assign tags to our Movies and Books, but instead of using MovieTag and BookTag models, we want to use a single TaggedItem class with a GFK to Movie or Book.

    Here is the model structure:

    from django.db import models
    from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation
    from django.contrib.contenttypes.models import ContentType
    
    
    class TaggedItem(models.Model):
        tag = models.SlugField()
        content_type = models.ForeignKey(ContentType)
        object_id = models.PositiveIntegerField()
        content_object = GenericForeignKey('content_type', 'object_id')
    
        def __unicode__(self):
            return self.tag
    
    
    class Director(models.Model):
        name = models.CharField(max_length=100)
    
        def __unicode__(self):
            return self.name
    
    
    class Movie(models.Model):
        name = models.CharField(max_length=100)
        director = models.ForeignKey(Director)
        tags = GenericRelation(TaggedItem, related_query_name='movies')
    
        def __unicode__(self):
            return self.name
    
    
    class Author(models.Model):
        name = models.CharField(max_length=100)
    
        def __unicode__(self):
            return self.name
    
    
    class Book(models.Model):
        name = models.CharField(max_length=100)
        author = models.ForeignKey(Author)
        tags = GenericRelation(TaggedItem, related_query_name='books')
    
        def __unicode__(self):
            return self.name
    

    And some initial data:

    >>> from tags.models import Book, Movie, Author, Director, TaggedItem
    >>> a = Author.objects.create(name='E L James')
    >>> b1 = Book.objects.create(name='Fifty Shades of Grey', author=a)
    >>> b2 = Book.objects.create(name='Fifty Shades Darker', author=a)
    >>> b3 = Book.objects.create(name='Fifty Shades Freed', author=a)
    >>> d = Director.objects.create(name='James Gunn')
    >>> m1 = Movie.objects.create(name='Guardians of the Galaxy', director=d)
    >>> t1 = TaggedItem.objects.create(content_object=b1, tag='roman')
    >>> t2 = TaggedItem.objects.create(content_object=b2, tag='roman')
    >>> t3 = TaggedItem.objects.create(content_object=b3, tag='roman')
    >>> t4 = TaggedItem.objects.create(content_object=m1, tag='action movie')
    

    So as the docs show we can do stuff like this.

    >>> b1.tags.all()
    [<TaggedItem: roman>]
    >>> m1.tags.all()
    [<TaggedItem: action movie>]
    >>> TaggedItem.objects.filter(books__author__name='E L James')
    [<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]
    >>> TaggedItem.objects.filter(movies__director__name='James Gunn')
    [<TaggedItem: action movie>]
    >>> Book.objects.all().prefetch_related('tags')
    [<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]
    >>> Book.objects.filter(tags__tag='roman')
    [<Book: Fifty Shades of Grey>, <Book: Fifty Shades Darker>, <Book: Fifty Shades Freed>]
    

    But, if we try to prefetch some related data of TaggedItem via this reverse generic relation, we are going to get an AttributeError.

    >>> TaggedItem.objects.all().prefetch_related('books')
    Traceback (most recent call last):
      ...
    AttributeError: 'Book' object has no attribute 'object_id'
    

    Some of you may ask, why I just don't use content_object instead of books here? The reason is, because this only work when we want to:

    1) prefetch only one level deep from querysets containing different type of content_object.

    >>> TaggedItem.objects.all().prefetch_related('content_object')
    [<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: action movie>]
    

    2) prefetch many levels but from querysets containing only one type of content_object.

    >>> TaggedItem.objects.filter(books__author__name='E L James').prefetch_related('content_object__author')
    [<TaggedItem: roman>, <TaggedItem: roman>, <TaggedItem: roman>]
    

    But, if we want both 1) and 2) (to prefetch many levels from queryset containing different types of content_objects, we can't use content_object.

    >>> TaggedItem.objects.all().prefetch_related('content_object__author')
    Traceback (most recent call last):
      ...
    AttributeError: 'Movie' object has no attribute 'author_id'
    

    Django thinks that all content_objects are Books, and thus they have an Author.

    Now imagine the situation where we want to prefetch not only the books with their author, but also the movies with their director. Here are few attempts.

    The silly way:

    >>> TaggedItem.objects.all().prefetch_related(
    ...     'content_object__author',
    ...     'content_object__director',
    ... )
    Traceback (most recent call last):
      ...
    AttributeError: 'Movie' object has no attribute 'author_id'
    

    Maybe with custom Prefetch object?

    >>>
    >>> TaggedItem.objects.all().prefetch_related(
    ...     Prefetch('content_object', queryset=Book.objects.all().select_related('author')),
    ...     Prefetch('content_object', queryset=Movie.objects.all().select_related('director')),
    ... )
    Traceback (most recent call last):
      ...
    ValueError: Custom queryset can't be used for this lookup.
    

    Some solutions of this problem are shown here. But that's a lot of massage over the data which I want to avoid. I really like the API coming from the reversed generic relations, it would be very nice to be able to do prefetchs like that:

    >>> TaggedItem.objects.all().prefetch_related(
    ...     'books__author',
    ...     'movies__director',
    ... )
    Traceback (most recent call last):
      ...
    AttributeError: 'Book' object has no attribute 'object_id'
    

    Or like that:

    >>> TaggedItem.objects.all().prefetch_related(
    ...     Prefetch('books', queryset=Book.objects.all().select_related('author')),
    ...     Prefetch('movies', queryset=Movie.objects.all().select_related('director')),
    ... )
    Traceback (most recent call last):
      ...
    AttributeError: 'Book' object has no attribute 'object_id'
    

    But as you can see, we aways get that AttributeError. I'm using Django 1.7.3 and Python 2.7.6. And i'm curious why Django is throwing that error? Why is Django searching for an object_id in the Book model? Why I think this may be a bug? Usually when we ask prefetch_related to resolve something it can't, we see:

    >>> TaggedItem.objects.all().prefetch_related('some_field')
    Traceback (most recent call last):
      ...
    AttributeError: Cannot find 'some_field' on TaggedItem object, 'some_field' is an invalid parameter to prefetch_related()
    

    But here, it is different. Django actually tries to resolve the relation... and fails. Is this a bug which should be reported? I have never reported anything to Django so that's why I'm asking here first. I'm unable to trace the error and decide for myself if this is a bug, or a feature which could be implemented.

    • Bernhard Vallant
      Bernhard Vallant over 9 years
      Ok looking at the Django Source I would say that this not a bug, but simply not supported... If you want to get the authors with the books you would need to use select_related() as this is a ForeignKey relation. To use this together with prefetch_related you would need to use a custom queryset which is currently not supported by Django for generic relations.
    • Todor
      Todor over 9 years
      Ok, thank you. I opened a ticket about this. Hope one day this feature will make it thru to the ORM : )
  • Todor
    Todor over 9 years
    Thank you for your response @Bernhard. The problem is that what i really need is to prefetch books with their author from a TaggetItem queryset containing not only books. This mean i need something like this: TaggedItem.objects.all().prefetch_related('books__author'). But that throws a strange error: AttributeError: 'Book' object has no attribute 'object_id'. May i ask for your opinion about it? Does it looks to you like a bug which should be reported?
  • Todor
    Todor over 9 years
    Some workarounds of this problem are presented here. But that's a lot of custom massage of the data which i want to avoid. I like the API coming from the reversed generic relations, and i believe this is the way of achieving such functionality.
  • Bernhard Vallant
    Bernhard Vallant over 9 years
    @Todor I've updated my answer, can't try it out right now but I hope it points you in the right direction... You may want to try it out with the related_query_name as well as with filtering for the content_type...
  • Todor
    Todor over 9 years
    Thank you again. Actually that's my struggle - is related_query_name supposed to work with prefetch_related. I've tested it with filtering on content_type, but the problem is that the ORM search for an object_id attribute inside Book model instead of inside TaggetItem model. I guess i'm goingo to open a ticket soon.
  • Todor
    Todor over 9 years
    The query is: TaggedItem.objects.filter(content_type=book_ct).prefetch_rel‌​ated('books'). I know i can use your example, or just use .prefetch_related('content_object__author'). But in my real use case this can't be done because i cant filter on content_type. I have to work with TaggedItems with content_types of type Book or Movie or something else.
  • Todor
    Todor over 9 years
    I will edit my question again to show you why i cant do prefetch's with content_type because i feel its not much clear right now.
  • eugene
    eugene about 9 years
    Is Bernhard's last updated code supposed to work or is it an attempt to solve a problem? I tried it on generic foreign key, and it throws an error. looking at source code of django (contrib.contenttypes.fields.get_prefetch_queryset), you are not allowed to supply queryset for genericforeign key prefetch.
  • andilabs
    andilabs over 8 years
    @eugene exactly: Custom queryset can't be used for this lookup.
  • vishes_shell
    vishes_shell about 5 years
    Have you tried to run your code? Because it fires same ValueError: Custom queryset can't be used for this lookup.. And also you have syntax errors with missing parenthesis.
  • Todor
    Todor about 5 years
    To be honest I don't remember, however, did some testing now and updated the answer with a working example. Unfortunately custom querysets in the Prefetch object seems like cannot work with GenericForeignKey, thus we cannot do select_related on the Book/Movie querysets, to fetch author/director.
  • Nepo Znat
    Nepo Znat about 2 years
    Note: You can use ContentType.objects.get_for_models(Book, Movie) to further reduce the queries.