Optimal LINQ query to get a random sub collection - Shuffle
Solution 1
Another option is to use OrderBy and to sort on a GUID value, which you can do so using:
var result = sequence.OrderBy(elem => Guid.NewGuid());
I did some empirical tests to convince myself that the above actually generates a random distribution (which it appears to do). You can see my results at Techniques for Randomly Reordering an Array.
Solution 2
Further to mquander's answer and Dan Blanchard's comment, here's a LINQ-friendly extension method that performs a Fisher-Yates-Durstenfeld shuffle:
// take n random items from yourCollection
var randomItems = yourCollection.Shuffle().Take(n);
// ...
public static class EnumerableExtensions
{
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
{
return source.Shuffle(new Random());
}
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
if (source == null) throw new ArgumentNullException("source");
if (rng == null) throw new ArgumentNullException("rng");
return source.ShuffleIterator(rng);
}
private static IEnumerable<T> ShuffleIterator<T>(
this IEnumerable<T> source, Random rng)
{
var buffer = source.ToList();
for (int i = 0; i < buffer.Count; i++)
{
int j = rng.Next(i, buffer.Count);
yield return buffer[j];
buffer[j] = buffer[i];
}
}
}
Solution 3
This has some issues with "random bias" and I am sure it's not optimal, this is another possibility:
var r = new Random();
l.OrderBy(x => r.NextDouble()).Take(n);
Solution 4
Shuffle the collection into a random order and take the first n
items from the result.
Related videos on Youtube
![Jobi Joy](https://i.stack.imgur.com/x3Eee.jpg?s=256&g=1)
Jobi Joy
VP of Engineering www.identitymine.com We build experiences on all platforms #Windows #Xbox #Windows10 #HTML #WPF #XAML #WinUI #xamarin (iOS and Android)
Updated on December 20, 2020Comments
-
Jobi Joy over 3 years
Please suggest an easiest way to get a random shuffled collection of count 'n' from a collection having 'N' items. where n <= N
-
Dan Blanchard over 14 yearsNote that you don't have to completely shuffle the collection when n < N. You simply have to iterate through Durstenfeld's algorithm n times, and take the last n items of the (partially) shuffled result.
-
John Melville about 13 yearsThis solution violates the contract of orderby, specifically that a given object has a consistent key throughout the sort. If it does work, it does so through chance alone, and could break in future versions of the framework. For more details, see blogs.msdn.com/b/ericlippert/archive/2011/01/31/…
-
Htbaa over 12 yearsIs it me or does this method screw up the list index? Whenever is use elementAt() after shuffling the result I get is totally unexpected...
-
LukeH over 12 years@Htbaa: The method returns a lazily evaluated sequence. If you do
seq.Shuffle().ElementAt(n)
multiple times then you're re-shuffling each time so it's likely that you'll get a different item at positionn
. If you want to shuffle once then you need to store the sequence in a concrete collection of some kind: for example,var list = seq.Shuffle().ToList()
. Then you can dolist.ElementAt(n)
as often as you like -- or justlist[n]
-- and you'll always get the same item back. -
Htbaa over 12 yearsAh thanks! Will use that. Am still quite new to C# and LINQ, so I was a bit baffled by the result I was getting :-).
-
gilly3 almost 12 yearsWhat is the benefit of having
.ShuffleIterator()
instead of putting that code in.Shuffle()
? And since you call.ToList()
, won't you have to iterate the entire collection rather than just the amount requested by.Take(n)
? I have a hunch that the first question may answer my second question. -
gilly3 almost 12 yearsThis provides unreliable results. I get a different element each time I call
myList.ElementAt(0)
. I get a different string each time I callstring.Join(",", myList)
. -
LukeH almost 12 years@gilly3: That's probably because these
Shuffle
methods return a lazyIEnumerable<T>
which will be re-evaluated each time you use it in astring.Join
orElementAt
call. That's just the nature ofIEnumerable<T>
; if you want the behaviour of a concrete collection then you should actually create a concrete collection by callingToArray
orToList
on the result of theShuffle
call. -
jocull over 11 yearsIs using
.Shuffle()
thread safe? I don't see why it shouldn't be but I am getting strange results... multiple threads seem to pull the same shuffle results. -
MutantNinjaCodeMonkey over 11 yearsGood catch. However, if this were committed to an actual collection -- such as adding .ToList() or .ToArray() to the end --, then there's no chance of the collection getting processed/iterating Linq code more than one time to perform the sort. I imagine that would survive future upgrades as well since it would act as a predictable snapshot.
-
Weeble over 11 years@gilly3 Sorry to bring this up a year later, but the point of ShuffleIterator is to let Shuffle perform argument validation up-front, rather than deferring it until MoveNext is first called on the enumerator (which may be never, or in code that is not expecting it). See msmvps.com/blogs/jon_skeet/archive/2008/03/02/…
-
Weeble over 11 years@jocull If you use the version that doesn't take an instance of Random, it will create one for you. Instances of Random created close in time have a risk of using the same seed, and thus giving the same sequence of random numbers. If you pass the same instance of Random to it on different threads, you will get unpredictable results because instances of Random are not thread-safe. You should create your Random instances with different seeds, and not share them between threads.
-
FDS almost 11 yearsI just stumbled on this page and think it's brilliant, but I don't understand the comment. what could possibly go wrong in using this method to shuffle a collection (I'm not asking that sarcastically, I'm genuinely curious about the possibilities)?
-
Klara_P over 10 yearsIf you look at it from the point of view of those crazy functional programmers, this use of OrderBy doesn't violate the contract -- it's just that the consistent unique key is a function of all the seeds used to generate the GUID, so it's very hard to spot its consistency. For a certain machine, point in time, etc, you will always get back the same GUID. I know it stretches the idea of the contract a bit, but theoretically it's sound, I think. Maybe some FP people can comment further.
-
Klara_P over 10 years@Josh, there's no practical problems; it's just that, theoretically, it could be considered an abuse of OrderBy because you can't get repeatable results (well, not without a heck of a lot of work). The thing is, you don't want repeatable results in this case.
-
seebiscuit over 10 years@DanBlanchard I agree with gilly3 that it seems like the collection is being reshuffled, rather than the n results requested.
-
bradlis7 about 9 yearsYou are wasting memory by creating
newList
. Maybe consider usingyield return
inside theforeach
statement. -
Eric Lippert about 9 yearsThe problem here is not that the key is not consistent. John Melville has misread my article; that article notes that making a comparison -- that is, this item is bigger, smaller or equal to another -- that is inconsistent violates the contract of the Sort method. This answer is wrong for a completely different reason: Guids are only guaranteed to be unique; their randomness is an implementation detail you should not rely upon.
-
Eric Lippert about 9 yearsIn particular, it is legal for guids to be generated sequentially from an initial random element; this still makes good guarantees of uniqueness. Just because the guid generator today does not actually generate sequential guids ever is an implementation detail subject to change, and if it does change, then suddenly your "shuffle" is "shuffling" things into sorted order every time. Use guids to generate uniqueness, never randomness. Use a class designed to generate randomness for randomness.
-
Enigmativity over 6 yearsThis doesn't have any random bias other than any insignificant bias from
Random
itself. It's also very efficient. -
Enigmativity over 6 yearsThe line
Enumerable.Range(1, max).OrderBy(n => n * n * (new Random()).Next())
is awful - it's not random at all. callingnew Random()).Next()
like that will just return the same number in most cases and then * n
makes the number biased - and possibly could cause overflow exceptions. -
Enigmativity over 6 years
p.GetHashCode().ToString() + Guid.NewGuid().ToString()).GetHashCode()
isn't random. It might look random, but it's not. You should use a RNG for randomness - data scientists have designed them specifically to be as random as possible.