How to get two random records with Django

14,447

Solution 1

If you specify the random operator in the ORM I'm pretty sure it will give you two distinct random results won't it?

MyModel.objects.order_by('?')[:2] # 2 random results.

Solution 2

The order_by('?')[:2] solution suggested by other answers is actually an extraordinarily bad thing to do for tables that have large numbers of rows. It results in an ORDER BY RAND() SQL query. As an example, here's how mysql handles that (the situation is not much different for other databases). Imagine your table has one billion rows:

  1. To accomplish ORDER BY RAND(), it needs a RAND() column to sort on.
  2. To do that, it needs a new table (the existing table has no such column).
  3. To do that, mysql creates a new, temporary table with the new columns and copies the existing ONE BILLION ROWS OF DATA into it.
  4. As it does so, it does as you asked, and runs rand() for every row to fill in that value. Yes, you've instructed mysql to GENERATE ONE BILLION RANDOM NUMBERS. That takes a while. :)
  5. A few hours/days later, when it's done it now has to sort it. Yes, you've instructed mysql to SORT THIS ONE BILLION ROW, WORST-CASE-ORDERED TABLE (worst-case because the sort key is random).
  6. A few days/weeks later, when that's done, it faithfully grabs the two measly rows you actually needed and returns them for you. Nice job. ;)

Note: just for a little extra gravy, be aware that mysql will initially try to create that temp table in RAM. When that's exhausted, it puts everything on hold to copy the whole thing to disk, so you get that extra knife-twist of an I/O bottleneck for nearly the entire process.

Doubters should look at the generated query to confirm that it's ORDER BY RAND() then Google for "order by rand()" (with the quotes).

A much better solution is to trade that one really expensive query for three cheap ones (limit/offset instead of ORDER BY RAND()):

import random
last = MyModel.objects.count() - 1

index1 = random.randint(0, last)
# Here's one simple way to keep even distribution for
# index2 while still gauranteeing not to match index1.
index2 = random.randint(0, last - 1)
if index2 == index1: index2 = last

# This syntax will generate "OFFSET=indexN LIMIT=1" queries
# so each returns a single record with no extraneous data.
MyObj1 = MyModel.objects.all()[index1]
MyObj2 = MyModel.objects.all()[index2]

Solution 3

For the future readers.

Get the the list of ids of all records:

my_ids = MyModel.objects.values_list('id', flat=True)
my_ids = list(my_ids)

Then pick n random ids from all of the above ids:

n = 2
rand_ids = random.sample(my_ids, n)

And get records for these ids:

random_records = MyModel.objects.filter(id__in=rand_ids)

Solution 4

Object.objects.order_by('?')[:2]

This would return two random-ordered records. You can add

distinct()

if there are records with the same value in your dataset.

Solution 5

About sampling n random values from a sequence, the random lib could be used,

random.Random().sample(range(0,last),2) 

will fetch 2 random samples from among the sequence elements, 0 to last-1

Share:
14,447
Matt McCormick
Author by

Matt McCormick

Updated on June 15, 2022

Comments

  • Matt McCormick
    Matt McCormick about 2 years

    How do I get two distinct random records using Django? I've seen questions about how to get one but I need to get two random records and they must differ.