Optimal LINQ query to get a random sub collection - Shuffle

19,005

Solution 1

Another option is to use OrderBy and to sort on a GUID value, which you can do so using:

var result = sequence.OrderBy(elem => Guid.NewGuid());

I did some empirical tests to convince myself that the above actually generates a random distribution (which it appears to do). You can see my results at Techniques for Randomly Reordering an Array.

Solution 2

Further to mquander's answer and Dan Blanchard's comment, here's a LINQ-friendly extension method that performs a Fisher-Yates-Durstenfeld shuffle:

// take n random items from yourCollection
var randomItems = yourCollection.Shuffle().Take(n);

// ...

public static class EnumerableExtensions
{
    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
    {
        return source.Shuffle(new Random());
    }

    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (rng == null) throw new ArgumentNullException("rng");

        return source.ShuffleIterator(rng);
    }

    private static IEnumerable<T> ShuffleIterator<T>(
        this IEnumerable<T> source, Random rng)
    {
        var buffer = source.ToList();
        for (int i = 0; i < buffer.Count; i++)
        {
            int j = rng.Next(i, buffer.Count);
            yield return buffer[j];

            buffer[j] = buffer[i];
        }
    }
}

Solution 3

This has some issues with "random bias" and I am sure it's not optimal, this is another possibility:

var r = new Random();
l.OrderBy(x => r.NextDouble()).Take(n);

Solution 4

Shuffle the collection into a random order and take the first n items from the result.

Share:
19,005

Related videos on Youtube

Jobi Joy
Author by

Jobi Joy

VP of Engineering www.identitymine.com We build experiences on all platforms #Windows #Xbox #Windows10 #HTML #WPF #XAML #WinUI #xamarin (iOS and Android)

Updated on December 20, 2020

Comments

  • Jobi Joy
    Jobi Joy over 3 years

    Please suggest an easiest way to get a random shuffled collection of count 'n' from a collection having 'N' items. where n <= N

  • Dan Blanchard
    Dan Blanchard over 14 years
    Note that you don't have to completely shuffle the collection when n < N. You simply have to iterate through Durstenfeld's algorithm n times, and take the last n items of the (partially) shuffled result.
  • John Melville
    John Melville about 13 years
    This solution violates the contract of orderby, specifically that a given object has a consistent key throughout the sort. If it does work, it does so through chance alone, and could break in future versions of the framework. For more details, see blogs.msdn.com/b/ericlippert/archive/2011/01/31/…
  • Htbaa
    Htbaa over 12 years
    Is it me or does this method screw up the list index? Whenever is use elementAt() after shuffling the result I get is totally unexpected...
  • LukeH
    LukeH over 12 years
    @Htbaa: The method returns a lazily evaluated sequence. If you do seq.Shuffle().ElementAt(n) multiple times then you're re-shuffling each time so it's likely that you'll get a different item at position n. If you want to shuffle once then you need to store the sequence in a concrete collection of some kind: for example, var list = seq.Shuffle().ToList(). Then you can do list.ElementAt(n) as often as you like -- or just list[n] -- and you'll always get the same item back.
  • Htbaa
    Htbaa over 12 years
    Ah thanks! Will use that. Am still quite new to C# and LINQ, so I was a bit baffled by the result I was getting :-).
  • gilly3
    gilly3 almost 12 years
    What is the benefit of having .ShuffleIterator() instead of putting that code in .Shuffle()? And since you call .ToList(), won't you have to iterate the entire collection rather than just the amount requested by .Take(n)? I have a hunch that the first question may answer my second question.
  • gilly3
    gilly3 almost 12 years
    This provides unreliable results. I get a different element each time I call myList.ElementAt(0). I get a different string each time I call string.Join(",", myList).
  • LukeH
    LukeH almost 12 years
    @gilly3: That's probably because these Shuffle methods return a lazy IEnumerable<T> which will be re-evaluated each time you use it in a string.Join or ElementAt call. That's just the nature of IEnumerable<T>; if you want the behaviour of a concrete collection then you should actually create a concrete collection by calling ToArray or ToList on the result of the Shuffle call.
  • jocull
    jocull over 11 years
    Is using .Shuffle() thread safe? I don't see why it shouldn't be but I am getting strange results... multiple threads seem to pull the same shuffle results.
  • MutantNinjaCodeMonkey
    MutantNinjaCodeMonkey over 11 years
    Good catch. However, if this were committed to an actual collection -- such as adding .ToList() or .ToArray() to the end --, then there's no chance of the collection getting processed/iterating Linq code more than one time to perform the sort. I imagine that would survive future upgrades as well since it would act as a predictable snapshot.
  • Weeble
    Weeble over 11 years
    @gilly3 Sorry to bring this up a year later, but the point of ShuffleIterator is to let Shuffle perform argument validation up-front, rather than deferring it until MoveNext is first called on the enumerator (which may be never, or in code that is not expecting it). See msmvps.com/blogs/jon_skeet/archive/2008/03/02/…
  • Weeble
    Weeble over 11 years
    @jocull If you use the version that doesn't take an instance of Random, it will create one for you. Instances of Random created close in time have a risk of using the same seed, and thus giving the same sequence of random numbers. If you pass the same instance of Random to it on different threads, you will get unpredictable results because instances of Random are not thread-safe. You should create your Random instances with different seeds, and not share them between threads.
  • FDS
    FDS almost 11 years
    I just stumbled on this page and think it's brilliant, but I don't understand the comment. what could possibly go wrong in using this method to shuffle a collection (I'm not asking that sarcastically, I'm genuinely curious about the possibilities)?
  • Klara_P
    Klara_P over 10 years
    If you look at it from the point of view of those crazy functional programmers, this use of OrderBy doesn't violate the contract -- it's just that the consistent unique key is a function of all the seeds used to generate the GUID, so it's very hard to spot its consistency. For a certain machine, point in time, etc, you will always get back the same GUID. I know it stretches the idea of the contract a bit, but theoretically it's sound, I think. Maybe some FP people can comment further.
  • Klara_P
    Klara_P over 10 years
    @Josh, there's no practical problems; it's just that, theoretically, it could be considered an abuse of OrderBy because you can't get repeatable results (well, not without a heck of a lot of work). The thing is, you don't want repeatable results in this case.
  • seebiscuit
    seebiscuit over 10 years
    @DanBlanchard I agree with gilly3 that it seems like the collection is being reshuffled, rather than the n results requested.
  • bradlis7
    bradlis7 about 9 years
    You are wasting memory by creating newList. Maybe consider using yield return inside the foreach statement.
  • Eric Lippert
    Eric Lippert about 9 years
    The problem here is not that the key is not consistent. John Melville has misread my article; that article notes that making a comparison -- that is, this item is bigger, smaller or equal to another -- that is inconsistent violates the contract of the Sort method. This answer is wrong for a completely different reason: Guids are only guaranteed to be unique; their randomness is an implementation detail you should not rely upon.
  • Eric Lippert
    Eric Lippert about 9 years
    In particular, it is legal for guids to be generated sequentially from an initial random element; this still makes good guarantees of uniqueness. Just because the guid generator today does not actually generate sequential guids ever is an implementation detail subject to change, and if it does change, then suddenly your "shuffle" is "shuffling" things into sorted order every time. Use guids to generate uniqueness, never randomness. Use a class designed to generate randomness for randomness.
  • Enigmativity
    Enigmativity over 6 years
    This doesn't have any random bias other than any insignificant bias from Random itself. It's also very efficient.
  • Enigmativity
    Enigmativity over 6 years
    The line Enumerable.Range(1, max).OrderBy(n => n * n * (new Random()).Next()) is awful - it's not random at all. calling new Random()).Next() like that will just return the same number in most cases and the n * n makes the number biased - and possibly could cause overflow exceptions.
  • Enigmativity
    Enigmativity over 6 years
    p.GetHashCode().ToString() + Guid.NewGuid().ToString()).GetHashCode() isn't random. It might look random, but it's not. You should use a RNG for randomness - data scientists have designed them specifically to be as random as possible.