Pick multiple random elements from a list in Java

27,396

Solution 1

Try this:

public static List<String> pickNRandom(List<String> lst, int n) {
    List<String> copy = new ArrayList<String>(lst);
    Collections.shuffle(copy);
    return n > copy.size() ? copy.subList(0, copy.size()) : copy.subList(0, n);
}

I'm assuming that there are no repeated elements in the input list, also I take the precaution of shuffling a copy for leaving the original list undisturbed. Use it like this:

List<String> randomPicks = pickNRandom(teamList, 3);

Solution 2

Create a set of ints, and put random numbers between 0 and list's length minus one into it in a loop, while the size of the set is not equal the desired number of random elements. Go through the set, and pick list elements as indicated by the numbers in the set. This way would keep your original list intact.

Solution 3

The shuffle approach is the most idiomatic: after that, first K elements are exactly what you need.

If K is much less than the length of the list, you may want to be faster. In this case, iterate through the list, randomly exchanging the current element with itself or any of the elements after it. After the K-th element, stop and return the K-prefix: it will be already perfectly shuffled, and you don't need to care about the rest of the list.

(obviously, you'd like to use ArrayList here)

Solution 4

Here is a way of doing it using Java streams, without having to create a copy of the original list or shuffling it:

public static List<String> pickRandom(List<String> list, int n) {
    if (n > list.size()) {
        throw new IllegalArgumentException("not enough elements");
    }
    Random random = new Random();
    return IntStream
            .generate(() -> random.nextInt(list.size()))
            .distinct()
            .limit(n)
            .mapToObj(list::get)
            .collect(Collectors.toList());
}

Note: It can become inefficient when n is too close to the list size for huge lists.

Solution 5

You can also use reservoir sampling.

It has the advantage that you do not need to know the size of the source list in advance (e.g. if you are given an Iterable instead of a List.) Also it is efficient even when the source list is not random-access, like the LinkedList in your example.

Share:
27,396

Related videos on Youtube

Ren
Author by

Ren

B.S. Computer Science. 10+ years of experience.

Updated on December 12, 2020

Comments

  • Ren
    Ren over 3 years

    So say I have

    List<String> teamList = new LinkedList<String>()
    teamList.add("team1");
    teamList.add("team2");
    teamList.add("team3");
    teamList.add("team4");
    teamList.add("team5");
    teamList.add("team6");
    

    Is there a simple way of picking... say 3 out the 6 elements in this list in a randomized way without picking the same element twice (or more times)?

  • corsiKa
    corsiKa over 12 years
    A really good idea assuming it's okay to change the order. Otherwise you can make yourself a local copy (a bit more expensive, but it works). I personally would use remove(teamList.size() - 1) so that if the implementation changes to a different list it has the highest chance of being efficient, but remove(0) works too :)
  • Ren
    Ren over 12 years
    Thanks, that is helpful, but I am trying to have the original list intact. Just pick the element but not to erase it so that I can have it for later use. I guess a workaround could be to create a second list with all the elements of the original list in it and then do what you suggest to do. Thank you anyway :)
  • Ren
    Ren over 12 years
    Lol, we just said partly the same thing. Thanks glowcoder.
  • alf
    alf over 12 years
    @Yokhen actually you can get a list from 0 to N-1, where N=teamList.size(), shuffle it, and use first K numbers as indices for the original list.
  • Ren
    Ren over 12 years
    That actually is a pretty good idea! Thank you very much!
  • waxwing
    waxwing over 12 years
    Good approach unless you want to pick 999 elements out of a 1000 element list :)
  • Ren
    Ren over 12 years
    That's a very good idea :) but keeping track of an alternate list of indexes can get a little tricky. But thank you anyway :)
  • Sergey Kalinichenko
    Sergey Kalinichenko over 12 years
    @waxwing Yeah, that would take a while :-) In general, if R, the desired number of random items, is greater than half list's length, it's cheaper to pick N-R elements to remove from the list, and then take all but the ones in the randomly-generated set.
  • Ren
    Ren over 12 years
    that sounds like a good alternative, but it will require me a bit more of work. Thanks anyway :)
  • waxwing
    waxwing over 12 years
    This is equivalent to alf's solution, right? Except he puts the elements at the start.
  • Ren
    Ren over 12 years
    Thanks for the answer. However could you explain more into detail? I am very unfamiliar with the method you just explained. Thank you.
  • Saurabh
    Saurabh over 12 years
    @Yokhen, the idea is that if (for example) you are choosing 3 items, and you have just considered the 40th item in the input list, at that point you have chosen 3 of 40 items so the last item has a 3/40 chance of being in the output array. If you look at the pseudocode in the Wikipedia article, you will see that is what the last operation (r ← random (0 .. i); if (r < k) then a[r] ← S[i]) does.
  • pawegio
    pawegio over 9 years
    usable protection from IndexOutOfBoundsException: return n > copy.size() ? copy.subList(0, copy.size()) : copy.subList(0, n);
  • Eyal
    Eyal over 9 years
    Shuffling the entire list when you only need 3 elements is very wasteful for large lists.
  • Hu Cao
    Hu Cao about 7 years
    This should be the best solution.
  • Tarun Chauhan
    Tarun Chauhan over 5 years
    well..i'm late to the party..but can't help myself to ask..how this solution will prevent duplicate numbers to get added in the set? I might get duplicate numbers in the set of int and change same value twice from the main list.
  • Sergey Kalinichenko
    Sergey Kalinichenko over 5 years
    @TarunChauhan Set data structure eliminates duplicates. In other words, if you need five elements out of 20, your loop may need to perform addition, say, seven or eight times before set gets to the desired length of 5.
  • Tarun Chauhan
    Tarun Chauhan over 5 years
    Alright...so it is just like continuous loop until you get the desired length of random numbers. But i think this solution can be used only when the desired size of random numbers is small. Say i have a range of 0-80 numbers and i want 52 random unique numbers out of it...then i think this solution may end up in more loops..correct me if I'm wrong...but yes for short lenght this solution seems best 😊👌🏻
  • ljh131
    ljh131 almost 5 years
    What if nextInt returns same numbers so it can't generate enough n numbers?
  • Helder Pereira
    Helder Pereira almost 5 years
    limit is after distinct so it will discard duplicate numbers and keep generating until there are n distinct numbers.
  • Arunav Sanyal
    Arunav Sanyal over 3 years
    Shuffle breaks if the passed in List is immutable.
  • Óscar López
    Óscar López over 3 years
    @ArunavSanyal several methods in Collections will break, if you take a look at the documentation it doesn't mention that immutability is supported. The Java API was not built with immutability in mind, you can be certain that any method that modifies a collection expects it to be mutable.
  • Arunav Sanyal
    Arunav Sanyal over 3 years
    @ÓscarLópez Yes I am aware of the fact that several methods in collections will break. I have been scarred enough number of times in production that I treat all collections as immutable unless explicitly required/alluded to. Code is significantly easier to argue about and write, despite being less efficient. Please note, this is not a criticism of your answer, its more of a warning, that there are very real scenarios in which it will cause problems.