Rails 3: Get Random Record

67,702

Solution 1

Thing.first(:order => "RANDOM()") # For MySQL :order => "RAND()", - thanx, @DanSingerman
# Rails 3
Thing.order("RANDOM()").first

or

Thing.first(:offset => rand(Thing.count))
# Rails 3
Thing.offset(rand(Thing.count)).first

Actually, in Rails 3 all examples will work. But using order RANDOM is quite slow for big tables but more sql-style

UPD. You can use the following trick on an indexed column (PostgreSQL syntax):

select * 
from my_table 
where id >= trunc(
  random() * (select max(id) from my_table) + 1
) 
order by id 
limit 1;

Solution 2

I am working on a project (Rails 3.0.15, ruby 1.9.3-p125-perf) where the db is in localhost and users table has a bit more than 100K records.

Using

order by RAND()

is quite slow

User.order("RAND(id)").first

becomes

SELECT users.* FROM users ORDER BY RAND(id) LIMIT 1

and takes from 8 to 12 seconds to respond!!

Rails log:

User Load (11030.8ms) SELECT users.* FROM users ORDER BY RAND() LIMIT 1

from mysql's explain

+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 110165 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+--------+---------------------------------+

You can see that no index is used (possible_keys = NULL), a temporary table is created and an extra pass is required to fetch the desired value (extra = Using temporary; Using filesort).

On the other hand, by splitting the query in two parts and using Ruby, we have a reasonable improvement in response time.

users = User.scoped.select(:id);nil
User.find( users.first( Random.rand( users.length )).last )

(;nil for console use)

Rails log:

User Load (25.2ms) SELECT id FROM users User Load (0.2ms) SELECT users.* FROM users WHERE users.id = 106854 LIMIT 1

and mysql's explain proves why:

+----+-------------+-------+-------+---------------+--------------------------+---------+------+--------+-------------+
| id | select_type | table | type  | possible_keys | key                      | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+--------------------------+---------+------+--------+-------------+
|  1 | SIMPLE      | users | index | NULL          | index_users_on_user_type | 2       | NULL | 110165 | Using index |
+----+-------------+-------+-------+---------------+--------------------------+---------+------+--------+-------------+

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | users | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

we can now use only indexes and the primary key and do the job about 500 times faster!

UPDATE:

as pointed out by icantbecool in comments the above solution has a flaw if there are deleted records in the table.

A workaround in that can be

users_count = User.count
User.scoped.limit(1).offset(rand(users_count)).first

which translates to two queries

SELECT COUNT(*) FROM `users`
SELECT `users`.* FROM `users` LIMIT 1 OFFSET 148794

and runs in about 500ms.

Solution 3

If using Postgres

User.limit(5).order("RANDOM()")

If using MySQL

User.limit(5).order("RAND()")

In both instances you're selecting 5 records randomly from the Users table. Here is the actual SQL query in displayed in the console.

SELECT * FROM users ORDER BY RANDOM() LIMIT 5

Solution 4

I made a rails 3 gem for doing this that performs better on large tables and allows you to chain relations and scopes:

https://github.com/spilliton/randumb

(edit): The default behavior of my gem basically uses the same approach as above now, but you have the option to use the old way if you want :)

Solution 5

Many of the answers posted actually won't perform well on rather large tables (1+ million rows). Random ordering quickly takes a few seconds, and doing a count on the table also takes quite long.

A solution that works well for me in this situation is to use RANDOM() with a where condition:

Thing.where('RANDOM() >= 0.9').take

On a table with over a million rows, this query generally takes less than 2ms.

Share:
67,702
Andrew
Author by

Andrew

Polyglot engineer and people leader.

Updated on May 29, 2020

Comments

  • Andrew
    Andrew almost 4 years

    So, I've found several examples for finding a random record in Rails 2 -- the preferred method seems to be:

    Thing.find :first, :offset => rand(Thing.count)
    

    Being something of a newbie I'm not sure how this could be constructed using the new find syntax in Rails 3.

    So, what's the "Rails 3 Way" to find a random record?

  • DanSingerman
    DanSingerman about 13 years
    Your first example won't work in MySQL though - the syntax for MySQL is Thing.first(:order => "RAND()") (a danger of writing SQL rather than using the ActiveRecord abstractions)
  • fl00r
    fl00r about 13 years
    @ DanSingerman, yes it is DB specific RAND() or RANDOM(). Thanks
  • Victor S
    Victor S over 12 years
    And this won't create issues if there are missing items from the index? (if something in the middle of the stack gets deleted, will there be a chance it will be requested?
  • SooDesuNe
    SooDesuNe over 12 years
    @VictorS, no it won't #offset just goes to the next available record. I tested it with Ruby 1.9.2 and Rails 3.1
  • DaveStephens
    DaveStephens over 11 years
    This is a very bad query to use if you have a large amount of records, as the DB will select ALL records, then Rails will pick five records from that - massively wasteful.
  • Frans
    Frans about 11 years
    sample isn't in ActiveRecord, sample is in Array. api.rubyonrails.org/classes/Array.html#method-i-sample
  • Andrew
    Andrew about 11 years
    Yeah, this is just getting all your records and using ruby array methods on them. The drawback there is of course that it means loading all your records into memory, then randomly reordering them, then grabbing the second item in the reordered array. That could definitely be a memory hog if you were dealing with a large dataset. Minor aside, why not grab the first element? (ie. shuffle[0])
  • Kevin Walsh
    Kevin Walsh almost 11 years
    This is an expensive way to get a random record, especially from a large table. Rails will load an object for every record from your table into memory. If you need proof, run 'rails console', try 'SomeModelFromYourApp.find(:all).sample(5)' and look at the SQL produced.
  • Arcolye
    Arcolye over 10 years
    See my answer, which turns this expensive answer into a streamlined beauty for getting multiple random records.
  • Maximiliano Guzman
    Maximiliano Guzman over 10 years
    no, the Model.pluck(:id).sample(3) part is not cheap. It will read the id field for every element in the table.
  • Arcolye
    Arcolye over 10 years
    Is there a faster database-agnostic way?
  • turing_machine
    turing_machine about 10 years
    adding ".id" after "last" to your second example will avoid a "couldn't find Model without ID" error. E.g. User.find( users.first( Random.rand( users.length )).last.id )
  • Bruno
    Bruno almost 10 years
    Couldn't find all Users with 'id': (first, {:offset=>1}) (found 0 results, but was looking for 2)
  • Tim Kretschmer
    Tim Kretschmer almost 10 years
    if there aren't any users and you wanna get 2, then you get errors. make sense.
  • JohnMerlino
    JohnMerlino almost 10 years
    when you do rand(Thing.count) isn't there a possibility it will return 0 and try to retrieve a record with id of 0, which may not exist? Or is offset accounting for that
  • fl00r
    fl00r over 9 years
    @JohnMerlino, yes 0 is offset, not id. Offet 0 means first item according to order.
  • Justin Tanner
    Justin Tanner over 9 years
    Warning! In MySQL RAND(id) will NOT give you a different random order every query. Use RAND() if want a different order each query.
  • Daniel Richter
    Daniel Richter about 9 years
    The second approach will not work with postgres, but you can use "RANDOM()" instead...
  • icantbecool
    icantbecool almost 9 years
    The User.find( users.first( Random.rand( users.length )).last.id ) will not work if there was a record deleted. [1,2,4,5,] and it potentially could pick the id of 3, but there wouldn't be an active record relation.
  • icantbecool
    icantbecool almost 9 years
    Also, users = User.scoped.select(:id);nil is not deprecated. Use this instead: users = User.where(nil).select(:id)
  • BenMorganIO
    BenMorganIO over 8 years
    Actually, #count will do a call to the DB for a COUNT. If the record is already loaded, this might be a bad idea. A refactor would be to use #size instead since it'll decide if #count should be used, or, if the record is already loaded, to use #length.
  • Dan Kohn
    Dan Kohn over 8 years
    Switched from count to size based on your feedback. More info at: dev.mensfeld.pl/2014/09/…
  • SWoo
    SWoo over 8 years
    I believe using Random.rand( users.length ) as the parameter to first is a bug. Random.rand can return 0. When 0 is used as a parameter to first, the limit is set to zero and this returns no records. What one should use instead is 1 + Random( users.length) assuming users.length > 0.
  • Andrew K
    Andrew K about 7 years
    This is really slow on a table with millions of rows
  • fl00r
    fl00r about 7 years
    @AndrewK yes, and this is mentioned in the post. Using offset will be faster, but still, offset is also not very efficient operation.
  • Marcelo Austria
    Marcelo Austria almost 7 years
    must be shuffle[0]
  • Piotr Galas
    Piotr Galas over 6 years
    Another advantages of your solution is use take function which gives LIMIT(1) query but return single element instead of array. So we do not need to invoke first
  • gorn
    gorn almost 6 years
    It seems to me that records at the beginning of table have higher probablility beeing selected this way, which might not be what you want to achieve.
  • gorn
    gorn almost 6 years
    This always takes subsequent records, which needs to be at least documented (as it might not be what user wants).