Is it correct to use JavaScript Array.sort() method for shuffling?

55,682

Solution 1

It's never been my favourite way of shuffling, partly because it is implementation-specific as you say. In particular, I seem to remember that the standard library sorting from either Java or .NET (not sure which) can often detect if you end up with an inconsistent comparison between some elements (e.g. you first claim A < B and B < C, but then C < A).

It also ends up as a more complex (in terms of execution time) shuffle than you really need.

I prefer the shuffle algorithm which effectively partitions the collection into "shuffled" (at the start of the collection, initially empty) and "unshuffled" (the rest of the collection). At each step of the algorithm, pick a random unshuffled element (which could be the first one) and swap it with the first unshuffled element - then treat it as shuffled (i.e. mentally move the partition to include it).

This is O(n) and only requires n-1 calls to the random number generator, which is nice. It also produces a genuine shuffle - any element has a 1/n chance of ending up in each space, regardless of its original position (assuming a reasonable RNG). The sorted version approximates to an even distribution (assuming that the random number generator doesn't pick the same value twice, which is highly unlikely if it's returning random doubles) but I find it easier to reason about the shuffle version :)

This approach is called a Fisher-Yates shuffle.

I would regard it as a best practice to code up this shuffle once and reuse it everywhere you need to shuffle items. Then you don't need to worry about sort implementations in terms of reliability or complexity. It's only a few lines of code (which I won't attempt in JavaScript!)

The Wikipedia article on shuffling (and in particular the shuffle algorithms section) talks about sorting a random projection - it's worth reading the section on poor implementations of shuffling in general, so you know what to avoid.

Solution 2

After Jon has already covered the theory, here's an implementation:

function shuffle(array) {
    var tmp, current, top = array.length;

    if(top) while(--top) {
        current = Math.floor(Math.random() * (top + 1));
        tmp = array[current];
        array[current] = array[top];
        array[top] = tmp;
    }

    return array;
}

The algorithm is O(n), whereas sorting should be O(n log n). Depending on the overhead of executing JS code compared to the native sort() function, this might lead to a noticable difference in performance which should increase with array sizes.


In the comments to bobobobo's answer, I stated that the algorithm in question might not produce evenly distributed probabilities (depending on the implementation of sort()).

My argument goes along these lines: A sorting algorithm requires a certain number c of comparisons, eg c = n(n-1)/2 for Bubblesort. Our random comparison function makes the outcome of each comparison equally likely, ie there are 2^c equally probable results. Now, each result has to correspond to one of the n! permutations of the array's entries, which makes an even distribution impossible in the general case. (This is a simplification, as the actual number of comparisons neeeded depends on the input array, but the assertion should still hold.)

As Jon pointed out, this alone is no reason to prefer Fisher-Yates over using sort(), as the random number generator will also map a finite number of pseudo-random values to the n! permutations. But the results of Fisher-Yates should still be better:

Math.random() produces a pseudo-random number in the range [0;1[. As JS uses double-precision floating point values, this corresponds to 2^x possible values where 52 ≤ x ≤ 63 (I'm too lazy to find the actual number). A probability distribution generated using Math.random() will stop behaving well if the number of atomic events is of the same order of magnitude.

When using Fisher-Yates, the relevant parameter is the size of the array, which should never approach 2^52 due to practical limitations.

When sorting with a random comparision function, the function basically only cares if the return value is positive or negative, so this will never be a problem. But there is a similar one: Because the comparison function is well-behaved, the 2^c possible results are, as stated, equally probable. If c ~ n log n then 2^c ~ n^(a·n) where a = const, which makes it at least possible that 2^c is of same magnitude as (or even less than) n! and thus leading to an uneven distribution, even if the sorting algorithm where to map onto the permutaions evenly. If this has any practical impact is beyond me.

The real problem is that the sorting algorithms are not guaranteed to map onto the permutations evenly. It's easy to see that Mergesort does as it's symmetric, but reasoning about something like Bubblesort or, more importantly, Quicksort or Heapsort, is not.


The bottom line: As long as sort() uses Mergesort, you should be reasonably safe except in corner cases (at least I'm hoping that 2^c ≤ n! is a corner case), if not, all bets are off.

Solution 3

I did some measurements of how random the results of this random sort are...

My technique was to take a small array [1,2,3,4] and create all (4! = 24) permutations of it. Then I would apply the shuffling function to the array a large number of times and count how many times each permutation is generated. A good shuffling algoritm would distribute the results quite evenly over all the permutations, while a bad one would not create that uniform result.

Using the code below I tested in Firefox, Opera, Chrome, IE6/7/8.

Surprisingly for me, the random sort and the real shuffle both created equally uniform distributions. So it seems that (as many have suggested) the main browsers are using merge sort. This of course doesn't mean, that there can't be a browser out there, that does differently, but I would say it means, that this random-sort-method is reliable enough to use in practice.

EDIT: This test didn't really measured correctly the randomness or lack thereof. See the other answer I posted.

But on the performance side the shuffle function given by Cristoph was a clear winner. Even for small four-element arrays the real shuffle performed about twice as fast as random-sort!

// The shuffle function posted by Cristoph.
var shuffle = function(array) {
    var tmp, current, top = array.length;

    if(top) while(--top) {
        current = Math.floor(Math.random() * (top + 1));
        tmp = array[current];
        array[current] = array[top];
        array[top] = tmp;
    }

    return array;
};

// the random sort function
var rnd = function() {
  return Math.round(Math.random())-0.5;
};
var randSort = function(A) {
  return A.sort(rnd);
};

var permutations = function(A) {
  if (A.length == 1) {
    return [A];
  }
  else {
    var perms = [];
    for (var i=0; i<A.length; i++) {
      var x = A.slice(i, i+1);
      var xs = A.slice(0, i).concat(A.slice(i+1));
      var subperms = permutations(xs);
      for (var j=0; j<subperms.length; j++) {
        perms.push(x.concat(subperms[j]));
      }
    }
    return perms;
  }
};

var test = function(A, iterations, func) {
  // init permutations
  var stats = {};
  var perms = permutations(A);
  for (var i in perms){
    stats[""+perms[i]] = 0;
  }

  // shuffle many times and gather stats
  var start=new Date();
  for (var i=0; i<iterations; i++) {
    var shuffled = func(A);
    stats[""+shuffled]++;
  }
  var end=new Date();

  // format result
  var arr=[];
  for (var i in stats) {
    arr.push(i+" "+stats[i]);
  }
  return arr.join("\n")+"\n\nTime taken: " + ((end - start)/1000) + " seconds.";
};

alert("random sort: " + test([1,2,3,4], 100000, randSort));
alert("shuffle: " + test([1,2,3,4], 100000, shuffle));

Solution 4

Interestingly, Microsoft used the same technique in their pick-random-browser-page.

They used a slightly different comparison function:

function RandomSort(a,b) {
    return (0.5 - Math.random());
}

Looks almost the same to me, but it turned out to be not so random...

So I made some testruns again with the same methodology used in the linked article, and indeed - turned out that the random-sorting-method produced flawed results. New test code here:

function shuffle(arr) {
  arr.sort(function(a,b) {
    return (0.5 - Math.random());
  });
}

function shuffle2(arr) {
  arr.sort(function(a,b) {
    return (Math.round(Math.random())-0.5);
  });
}

function shuffle3(array) {
  var tmp, current, top = array.length;

  if(top) while(--top) {
    current = Math.floor(Math.random() * (top + 1));
    tmp = array[current];
    array[current] = array[top];
    array[top] = tmp;
  }

  return array;
}

var counts = [
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0],
  [0,0,0,0,0]
];

var arr;
for (var i=0; i<100000; i++) {
  arr = [0,1,2,3,4];
  shuffle3(arr);
  arr.forEach(function(x, i){ counts[x][i]++;});
}

alert(counts.map(function(a){return a.join(", ");}).join("\n"));

Solution 5

I have placed a simple test page on my website showing the bias of your current browser versus other popular browsers using different methods to shuffle. It shows the terrible bias of just using Math.random()-0.5, another 'random' shuffle that isn't biased, and the Fisher-Yates method mentioned above.

You can see that on some browsers there is as high as a 50% chance that certain elements will not change place at all during the 'shuffle'!

Note: you can make the implementation of the Fisher-Yates shuffle by @Christoph slightly faster for Safari by changing the code to:

function shuffle(array) {
  for (var tmp, cur, top=array.length; top--;){
    cur = (Math.random() * (top + 1)) << 0;
    tmp = array[cur]; array[cur] = array[top]; array[top] = tmp;
  }
  return array;
}

Test results: http://jsperf.com/optimized-fisher-yates

Share:
55,682
Rene Saarsoo
Author by

Rene Saarsoo

From day-to-day I work on JavaScript documentation tool JSDuck. Sometimes I write to triin.net or hang around at Estonian Stack Overflow clone pinu.ee. At other days I can be stuck in Windows or Mac. Usually though I'm lucky to be on the land of Linux. Or you could check out my Github page.

Updated on September 29, 2021

Comments

  • Rene Saarsoo
    Rene Saarsoo over 2 years

    I was helping somebody out with his JavaScript code and my eyes were caught by a section that looked like that:

    function randOrd(){
      return (Math.round(Math.random())-0.5);
    }
    coords.sort(randOrd);
    alert(coords);
    

    My first though was: hey, this can't possibly work! But then I did some experimenting and found that it indeed at least seems to provide nicely randomized results.

    Then I did some web search and almost at the top found an article from which this code was most ceartanly copied. Looked like a pretty respectable site and author...

    But my gut feeling tells me, that this must be wrong. Especially as the sorting algorithm is not specified by ECMA standard. I think different sorting algoritms will result in different non-uniform shuffles. Some sorting algorithms may probably even loop infinitely...

    But what do you think?

    And as another question... how would I now go and measure how random the results of this shuffling technique are?

    update: I did some measurements and posted the results below as one of the answers.

  • Christoph
    Christoph almost 15 years
    there is something wrong with this approach: depending on the sorting algorithm in use by the JS implementation, the probabilities won't be equally distributed!
  • bobobobo
    bobobobo almost 15 years
    Is that something we practically worry about?
  • Christoph
    Christoph almost 15 years
    @bobobobo: depending on the application, yes, sometimes we do; also, a correctly working shuffle() only has to be written once, so it's not really an issue: just put the snippet in your code vault and unearth it whenever you need it
  • Christoph
    Christoph almost 15 years
    if my reasoning is correct, the sorted version does not produce a 'genuine' shuffle!
  • Jon Skeet
    Jon Skeet almost 15 years
    @Christoph: Thinking about it, even Fisher-Yates will only give a "perfect" distribution if rand(x) is guaranteed to be exactly even over its range. Given that there are usually 2^x possible states for the RNG for some x, I don't think it'll be exactly even for rand(3).
  • Christoph
    Christoph almost 15 years
    @Jon: but Fisher-Yates will create 2^x states for each array index, ie there'll be 2^(xn) states total, which should be quite a bit larger that 2^c - see my edited answer for details
  • Rene Saarsoo
    Rene Saarsoo almost 15 years
    Thanks for the implementation. It's blazingly fast! Especially compared to that slow crap that I wrote by myself in the meantime.
  • Jon Skeet
    Jon Skeet almost 15 years
    @Christoph: I may not have explained myself properly. Suppose you have just 3 elements. You pick the first element randomly, out of all 3. To get a completely uniform distribution, you'd have to be able to choose a random number in the range [0,3) totally uniformly - and if the PRNG has 2^n possible states, you can't do that - one or two of the possibilities will have a slightly higher probability of occurring.
  • Christoph
    Christoph almost 15 years
    @Jon: you're right and my argument is flawed; I actually posted a comment detailing this, than deleted it again and sent the one above; I believe I should do some more thinking before answering again...
  • Rene Saarsoo
    Rene Saarsoo almost 15 years
    I think I'll accept this answer, as it has provided me with a lot of references to good resources. Although other answers were really valuable too, especially the one by Christoph.
  • greenkarmic
    greenkarmic over 12 years
    If you're using the underscore.js library, here's how to extend it with the above Fisher-Yates shuffle method: github.com/ryantenney/underscore/commit/…
  • Hello World
    Hello World almost 11 years
    Thank you very much for this, the combination of yours and Johns answer helped me fix a problem that me and a colleague spent almost 4 hours combined on! We originally had a similar method to the OP but found that the randomization was very flaky, so we took your method and changed it slightly to work with a little bit of jquery to jumble up a list of images (for a slider) to get some awesome randomization.
  • Kir Kanos
    Kir Kanos over 10 years
    Your implementation has a high risk to let a significant number of elements untouched. They will just be shifted in the whole array by the amount of inferior elements having been pushed on top. There is a pattern drawn in that shuffling that makes it unreliable.
  • ic3b3rg
    ic3b3rg over 10 years
    @KirKanos, I'm not sure I understand your comment. The solution I propose is O(n). It's definitely going to "touch" every element. Here's a fiddle to demonstrate.
  • JLRishe
    JLRishe almost 10 years
    I think you're going way too easy on the algorithm in the question above. It's a horrible hack and nobody should ever use it. It's inefficient and produces heavily biased results (or if you're really unlucky, it can crash your program). Will It Shuffle? provides a good visualization of the bias inherent in broken shuffling algorithms (and also prevents Fisher-Yates and an algorithm that uses sort but actually works).
  • Daniel Martin
    Daniel Martin almost 9 years
    Thing is, you're almost always pickier about distribution than you think you are, and for "small code", there's always arr = arr.map(function(n){return [Math.random(),n]}).sort().map(function(n){return n[1]});, which has the advantage of being not too terribly much longer and actually properly distributed. There are also very compressed Knuth/F-Y shuffle variants.
  • Giacomo1968
    Giacomo1968 over 8 years
    @DanielMartin That one-liner should be an answer. Also, to avoid parsing errors, two semicolons need to be added so it looks like this: arr = arr.map(function(n){return [Math.random(),n];}).sort().map(function(n){return n[1];});.
  • Alexander Mills
    Alexander Mills about 8 years
    I don't see why it has to be 0.5 - Math.random(), why not just Math.random()?
  • LarsH
    LarsH about 7 years
    @AlexanderMills: The comparator function passed to sort() is supposed to return a number greater than, less than, or equal to zero depending on the comparison of a and b. (developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…‌​)
  • Alexander Mills
    Alexander Mills about 7 years
    @LarsH yeah that makes sense