What's the difference between select_related and prefetch_related in Django ORM?
Solution 1
Your understanding is mostly correct. You use select_related
when the object that you're going to be selecting is a single object, so OneToOneField
or a ForeignKey
. You use prefetch_related
when you're going to get a "set" of things, so ManyToManyField
s as you stated or reverse ForeignKey
s. Just to clarify what I mean by "reverse ForeignKey
s" here's an example:
class ModelA(models.Model):
pass
class ModelB(models.Model):
a = ForeignKey(ModelA)
ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship
The difference is that select_related
does an SQL join and therefore gets the results back as part of the table from the SQL server. prefetch_related
on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA
in the above example). You may use prefetch_related
for anything that you can use select_related
for.
The tradeoffs are that prefetch_related
has to create and send a list of IDs to select back to the server, this can take a while. I'm not sure if there's a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT ... WHERE pk IN (...,...,...) basically. In this case if the prefetched data is sparse (let's say U.S. State objects linked to people's addresses) this can be very good, however if it's closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.
Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related
has the extra benefit that a single object is used to represent each object in the database. With select_related
duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.
Solution 2
Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.
The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.
select_related
performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.
Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won't increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related
to relations that won't unexpectedly result in a massive join.
The "join in python" for prefetch_related
is a little more alarming then it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:
SELECT "credential"."id",
"credential"."uuid",
"credential"."identity_id"
FROM "credential"
WHERE "credential"."identity_id" IN
(84706, 48746, 871441, 84713, 76492, 84621, 51472);
Rather than performing a single join with potentially too many rows, each table is split into a separate query.
Solution 3
Gone through the already posted answers. Just thought it would be better if I add an answer with actual example.
Let' say you have 3 Django models which are related.
class M1(models.Model):
name = models.CharField(max_length=10)
class M2(models.Model):
name = models.CharField(max_length=10)
select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
prefetch_relation = models.ManyToManyField(to='M3')
class M3(models.Model):
name = models.CharField(max_length=10)
Here you can query M2
model and its relative M1
objects using select_relation
field and M3
objects using prefetch_relation
field.
However as we've mentioned M1
's relation from M2
is a ForeignKey
, it just returns only 1 record for any M2
object. Same thing applies for OneToOneField
as well.
But M3
's relation from M2
is a ManyToManyField
which might return any number of M1
objects.
Consider a case where you have 2 M2
objects m21
, m22
who have same 5 associated M3
objects with IDs 1,2,3,4,5
. When you fetch associated M3
objects for each of those M2
objects, if you use select related, this is how it's going to work.
Steps:
- Find
m21
object. - Query all the
M3
objects related tom21
object whose IDs are1,2,3,4,5
. - Repeat same thing for
m22
object and all otherM2
objects.
As we have same 1,2,3,4,5
IDs for both m21
, m22
objects, if we use select_related option, it's going to query the DB twice for the same IDs which were already fetched.
Instead if you use prefetch_related, when you try to get M2
objects, it will make a note of all the IDs that your objects returned (Note: only the IDs) while querying M2
table and as last step, Django is going to make a query to M3
table with the set of all IDs that your M2
objects have returned. and join them to M2
objects using Python instead of database.
This way you're querying all the M3
objects only once which improves performance as python joins are cheaper than database joins.
NeoWang
Coder and Enterpreneur. Have worked on Web backend (Python/Django/Tornado), FrontEnd(jQuery/Backbone/Bootstrap/Angular), iOS/Android, embedded developement, also interested in Machine Learning/NLP stuff. Have cofounded a startup on a site-builder SaaS product, failed due to lack of marketing experience. Now working at an Internet start-up. SOreadytohelp
Updated on July 20, 2022Comments
-
NeoWang almost 2 years
In Django doc,
select_related()
"follows" foreign-key relationships, selecting additional related-object data when it executes its query.prefetch_related()
does a separate lookup for each relationship, and does the "joining" in Python.What does it mean by "doing the joining in python"? Can someone illustrate with an example?
My understanding is that for foreign key relationship, use
select_related
; and for M2M relationship, useprefetch_related
. Is this correct? -
elad silver over 7 yearswhat is faster though?
-
bhinesley about 7 years
select_related
is one query whileprefetch_related
is two, so the former is faster. Butselect_related
won't help you forManyToManyField
's -
CrazyCasta about 7 years@eladsilver Sorry for the slow reply. It actually depends.
select_related
uses a JOIN in the SQL whereasprefetch_related
run the query on the first model, collects all the IDs it needs to prefetch and then runs a query with an IN clause in the WHERE with all the IDs that it needs. If you have say 3-5 models using the same foreign key,select_related
will almost certainly be better. If you have 100s or 1000s of models using the same foreign key,prefetch_related
could actually be better. In between you'll have to test and see what happens. -
princ_sameer almost 7 yearsI would dispute your comment about prefetch related "generally doesn't make much sense". That's true for FK fields marked unique, but anywhere where multiple rows have the same FK value (author, user, category, city etc etc) prefetch cuts down bandwidth between Django and the DB but not duplicating rows. It also generally uses less memory on the DB. Either of these is often more important than the overhead of a single extra query. Given this is the top answer on a reasonably popular question I think that should be noted in the answer.
-
CrazyCasta over 6 years@GordonWrigley Yeah, it's been a while since I wrote that, so I went back and clarified a bit. I'm not sure I agree with the "uses less memory on the DB" bit, but yes to everything. And it can sure use less memory on the Python side.
-
CrazyCasta over 5 years@chrisFrisina I have no idea what that edit was supposed to be. It looks like you're trying to turn my comments into quotes. If you can explain your change I'd be happy to put it back.
-
chris Frisina over 5 yearsI was visually clarifying via blocks. The markup names these
quotes
as you are referring to them, however, it appears this question/answer is being used as a reference given its high traffic count, so I was attempting to make it easier to read visually from a concatenated paragraph as it was originally written. Whichever way you want to do it is fine -
smartworld-dm over 4 years@CrazyCasta what is
modelb_set
here? -
CrazyCasta over 4 yearsGiven an instance of
ModelA
,modelb_set
on that instance ofModelA
is an iterable ofModelB
instances that have as valuea
the instance ofModelA
that you are looking atmodelb_set
for. It's saying "give me all objects ofModelB
that have as valuea
this thing that I'm usingmodelb_set
on". -
Tushar Mittal about 2 yearsBest answer uptil now, have been searching through many