Is Collections.shuffle() really random enough? Practical examples seem to deny this statement

20,093

Solution 1

If you're showing 20 images out of 1000 the probability of seeing any one of that 20 repeated in the next iteration is approximately 0.34 so you shouldn't be surprised to see images repeating.

The chances of seeing a specific image is still one in a thousand, but if you're looking for twenty images the chances are much higher.

We can calculate the probability of none of the previous 20 images repeating as:

 980   979         961
———— × ——— × ... × ——— ≈ 0.66
1000   999         981

And so the probability of seeing a repeat is one minus this, or approximately 0.34.

And the probability of seeing an image repeated in either of the next two iterations is:

1 - (0.66 × 0.66) ≈ 0.56

In other words, it's more likely than not that you'll see a repeated image over the two following cycles. (And this isn't including images repeated from the second cycle in the third which will only make it more likely.)

For what it's worth, here's some Java code to do the above calculation:

float result = 1.0f;
int totalImages = 1000;
int displayedImages = 20;

for (int i = 0; i < displayedImages; i++) {
  result = result * (totalImages - displayedImages - i) / (totalImages - i);
}

System.out.println(result);

Solution 2

Its human nature to see patterns which are not there. Many people see patterns in the planets and stars as guiding their life.

In the first 1000 digits of PI there are six nines in a row. Does that mean the digits of PI are not random? no. The pattern doesn't occur again any more than your might expect.

Having said that, Random is not completely random and it will repeat after 2^48 calls. (it uses a 48-bit seed) This means its not possible to produce every possible long or double using it. If you want more randomness you can use SecureRandom with shuffle instead.

It sounds like what you want is something like this

List<String> imagePaths = new ArrayList<>();

// called repeatedly
if (imagePaths.size() <= 500) {
    imagePaths = get1000Images();
    Collections.shuffle(imagePaths);
}

for (String path: imagePaths.subList(0, 20)) {
  ... do something with the path ...
}

imagePaths = imagePaths.subList(20, imagePaths.size());

This will ensure that you don't see the same image in the last 500 calls.

Solution 3

Your intuition is correct for a specific image [you are not likely to see a specific image over and over again], but not for a general image [you are likely to see some image repeating]. This is one of these places in probability that our automatic intuition is wrong...

This reminds me the birthday paradox, which contradicts the intuition, and says - for a group of 23 people, the likelihood of 2 of them having the same birthday is 0.5, much more then the intuition expects!

Solution 4

I did a 52 card shuffle four different times and marked every time each iteration repeated the exact same card in the exact same slot, which gave me approximately 14 out of 208 cards, which was approximately 93.3% random.

Share:
20,093
basZero
Author by

basZero

WORK: Web App Developer, Java Enthusiast since 1995 HOME: I'm in love with photography

Updated on July 09, 2022

Comments

  • basZero
    basZero almost 2 years

    I have 1000 unique objects in a java.util.List, each referring to an image, each image in the 1000-list is unique and now I'd like to shuffle them, so that I can use the first 20 objects and present them to the website-user. The user can then click a button saying "Shuffle", and I retrieve the 1000 images again from scratch and calling again shuffle(). However, it seems that out of 1000 image objects, I very often see the same image again and again between the 20-image-selections.

    Something seems to be wrong, any better suggestion, advices?

    My code is very simple:

    List<String> imagePaths = get1000Images();
    Collections.shuffle(imagePaths);
    
    int i = 0;
    for (String path: imagePaths) {
      ... do something with the path ...
      i++;
      if (i >= 20) break;
    }
    

    I know that Collections.shuffle() is well distributed: see for instance http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/

    However, I just have the feeling that the probability of seeing the same image over and over again in a set of 20 images out of 1000 should be much less...

    Inputs highly appreciated.

  • basZero
    basZero about 12 years
    I can guarantee that all images in the list are different. they come directly from a lucene index where the path is the 'primary key' in the lucene index
  • Graham Borland
    Graham Borland about 12 years
    If your code really is the way you have it, where you're just iterating over the list and not modifying the list after the initial shuffle, then the only way you can get duplicates in your chosen 20 images is if there are duplicates in the list to begin with. Collections.shuffle() does not insert copies, it just reorders the existing items.
  • aioobe
    aioobe about 12 years
    He is seeing the same image among the selected 20 over and over on several subsequent shuffles.
  • basZero
    basZero about 12 years
    Great, and I like your suggestion to choose 20 random entries instead of shuffling 10 times... Just a sidenote: picking 20 random elements can also end up in picking the same twice. So this should be changed a bit, but your code example is a good start!
  • AlexR
    AlexR about 12 years
    @basZero, my code sample takes into consideration taking same elements twice: I used Set to store results and iterated until the set size is 20.
  • David Webb
    David Webb about 12 years
    @ChristofferHammarström - Fixed.
  • basZero
    basZero about 12 years
    True, sorry, thought you'd have used List
  • basZero
    basZero about 12 years
    But r.nextInt() should be replaced by r.nextInt() % list.size(), no?
  • Voo
    Voo about 12 years
    @basZero And there's a really simple (and much more efficient) solution to avoid that problem; see here. Yep that's right, if we were to do this correctly we'd just reimplement the shuffle algorithm that's already in use. To prove randomness you never believe your intuition - that'll basically always be wrong. There are statistical tests for that (Chi-Square, Kolmogorov-Smirnov,..). Also never do nextInt() % size if you want a uniform distribution, that'll obviously only work in rare cases.
  • basZero
    basZero about 12 years
    @Voo So you are saying: reimplement... But then before investing 1 hour or more for the final algorithm, I'd rather use 10 times calling Collections.shuffle()...
  • Christoffer Hammarström
    Christoffer Hammarström about 12 years
    Or maybe it's supposed to be 1 - 0.67 = 0.33?
  • basZero
    basZero about 12 years
    I thought about that too, actually this morning :) Thanks for taking your time for the code example...
  • David Webb
    David Webb about 12 years
    The code above returns 0.6649897 which I'm rounding to 0.66. I'm not sure the exact values matter too much, the point is that you can expect to see one of the previous 20 images repeat about one in every three times.
  • Voo
    Voo about 12 years
    @basZero Calling shuffle several times doesn't make any sense from a statistical pov (if you believe otherwise, I'm always game for some chi-square tests, note that "doesn't look random" is uninteresting). But the point is: What the shuffle algorithm does is basically taking random elements out of the list. Hence the edited solution is basically a not really efficient shuffle algorithm with some bugs in it.
  • basZero
    basZero about 12 years
    @Voo thanks Voo, I didn't look at the shuffl() implementation of JavaSE6, so if it works the same as the example above for randomly picked elements, it's useless. But the comment is interesting: Calling 10x shuffle() is better than once, but 100x does not give added randomness... What's your point on that statement?
  • Voo
    Voo about 12 years
    @basZero Where do you get the idea that I say that calling shuffle 10x is better than once? Just think about it for a second: Shuffle uniformly distributes all elements of the list, without regard for their earlier position. Hence calling it several times in a row means all but the last call are useless
  • basZero
    basZero about 12 years
    @Voo I didn't say that you said that. See this post by AlexR
  • Ascalonian
    Ascalonian over 9 years
    To use the SecureRandom, you can do: Collections.shuffle(imagePaths, new SecureRandom());