How to pick an item by its probability?

64,663

Solution 1

So with each item store a number that marks its relative probability, for example if you have 3 items one should be twice as likely to be selected as either of the other two then your list will have:

 [{A,1},{B,1},{C,2}]

Then sum the numbers of the list (i.e. 4 in our case). Now generate a random number and choose that index. int index = rand.nextInt(4); return the number such that the index is in the correct range.

Java code:

class Item {
    int relativeProb;
    String name;

    //Getters Setters and Constructor
}

...

class RandomSelector {
    List<Item> items = new List();
    Random rand = new Random();
    int totalSum = 0;

    RandomSelector() {
        for(Item item : items) {
            totalSum = totalSum + item.relativeProb;
        }
    }

    public Item getRandom() {

        int index = rand.nextInt(totalSum);
        int sum = 0;
        int i=0;
        while(sum < index ) {
             sum = sum + items.get(i++).relativeProb;
        }
        return items.get(Math.max(0,i-1));
    }
}

Solution 2

  1. Generate a uniformly distributed random number.
  2. Iterate through your list until the cumulative probability of the visited elements is greater than the random number

Sample code:

double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
    cumulativeProbability += item.probability();
    if (p <= cumulativeProbability) {
        return item;
    }
}

Solution 3

pretend that we have the following list

Item A 25%
Item B 15%
Item C 35%
Item D 5%
Item E 20%

Lets pretend that all the probabilities are integers, and assign each item a "range" that calculated as follows.

Start - Sum of probability of all items before
End - Start + own probability

The new numbers are as follows

Item A 0 to 25
Item B 26 to 40
Item C 41 to 75
Item D 76 to 80
Item E 81 to 100

Now pick a random number from 0 to 100. Lets say that you pick 32. 32 falls in Item B's range.

mj

Solution 4

You can try the Roulette Wheel Selection.

First, add all the probabilities, then scale all the probabilities in the scale of 1, by dividing each one by the sum. Suppose the scaled probabilities are A(0.4), B(0.3), C(0.25) and D(0.05). Then you can generate a random floating-point number in the range [0, 1). Now you can decide like this:

random number between 0.00 and 0.40 -> pick A
              between 0.40 and 0.70 -> pick B
              between 0.70 and 0.95 -> pick C
              between 0.95 and 1.00 -> pick D

You can also do it with random integers - say you generate a random integer between 0 to 99 (inclusive), then you can make decision like the above.

Solution 5

Algorithm described in Ushman's, Brent's and @kaushaya's answers are implemented in Apache commons-math library.

Take a look at EnumeratedDistribution class (groovy code follows):

def probabilities = [
   new Pair<String, Double>("one", 25),
   new Pair<String, Double>("two", 30),
   new Pair<String, Double>("three", 45)]
def distribution = new EnumeratedDistribution<String>(probabilities)
println distribution.sample() // here you get one of your values

Note that sum of probabilities doesn't need to be equal to 1 or 100 - it will be normalized automatically.

Share:
64,663
Ruzanna
Author by

Ruzanna

Updated on July 05, 2022

Comments

  • Ruzanna
    Ruzanna almost 2 years

    I have a list of items. Each of these items has its own probability.

    Can anyone suggest an algorithm to pick an item based on its probability?

  • Michael McGowan
    Michael McGowan about 12 years
    (+1) It bothers me that this algorithm almost always seems to be described in terms of GAs (your link on Wikipedia and see here also). The weighted roulette wheel algorithm has all kinds of uses that have nothing to do with GAs (such as this very question).
  • Sufian Latif
    Sufian Latif about 12 years
    Yeah, that's weird. I also learned it's name while studying GAs, but I used the technique much before that for some other reason.
  • Ruzanna
    Ruzanna about 12 years
    thanks Usman. but I wonder should I take i-th item or (i-1)th item ? I mean items.get(i-1) instead of items.get(i)
  • rrs
    rrs about 10 years
    I don't think you need to worry about the case where the item probability is zero. You should either have already exited the loop or you would continue over as the cumulative probability wouldn't have changed.
  • stan
    stan almost 10 years
    This faster than @Brent's answer for selection, but it would take up way too much memory if say, the ranges were from 0 to a million.
  • HybrisHelp
    HybrisHelp over 9 years
    Here p is any random number so how can we say that most probability item get selected first .. eg:[{A,10},{B,20}] so how can you we say that suppose in first iteration p=2 so 2<=10 is true and first item {A,10} is getting selected first even though second item have more probabilty
  • HybrisHelp
    HybrisHelp over 9 years
    Here p is any random number so how can we say that most probability item get selected first .. eg:[{A,10},{B,20}] so how can you we say that suppose in first iteration p=2 so 2<=10 is true and first item {A,10} is getting selected first even though second item have more probabilty
  • Usman Ismail
    Usman Ismail over 9 years
    p is a random number from 0 to "Sum of all weights" so in your example there is a 10/30 = 1/3 chance of p being an number from 0-9 and there is a 20/30 = 2/3 change of p being a number from 10-29. Hence the chance of getting B is still twice as likely as getting A
  • WVrock
    WVrock over 9 years
    Probabilities doesn't have to be percentages. We can just pick a random number between 0 to sum of the numbers.
  • TehQuila
    TehQuila over 9 years
    Hey great answer but you have to be carefull! The function nextInt(...) returns a value from 0 (inclusive) to totalSum (exclusive). (See docs.oracle.com/javase/7/docs/api/java/util/…). So if 0 is picked (index = 0), you will get an IndexOutOfBoundsException, because you try to access items.get(-1), since you skip the while loop entirely and i = 0. Therefore you should return items.get(Math.max(0, i-1)).
  • Jxek
    Jxek about 9 years
    Based on some experimentation this only worked for me if I replaced int index = rand.nextInt(totalSum) with int index = rand.nextInt(totalSum) + 1 and then of course took out the redundant Math.max. Without this +1 I was getting twice as many of the "base" (and first) element in the list than expected, i.e. the first element in my list has a relative likelihood 1 by default, and everything else is with respect to that.
  • WVrock
    WVrock almost 9 years
    Why is this down voted? I would appreciate an explanation.
  • aioobe
    aioobe over 8 years
    @U2Answer, This algorithm requires the probabilities to be normalized (divided by the sum of all probabilities). By doing that you ensure that Σp_i=1 and then a random number between 0 and 1 will do the trick.
  • matthaeus
    matthaeus about 8 years
    The order of the items in the list doesn't matter, because the value of r is a uniformly distributed random number, which means that the probability r be a certain value is equal to all other values r may be. Thus the items in the list are not "favored" and it doesn't matter where they are in the list.
  • Mark Peschel
    Mark Peschel almost 7 years
    Your method does not map to probabilities as easily as it first seems. Suppose we want two choices, A and B. A has 2/5 = 0 to 40, B has 3/5 = 0 to 60. If we simulate this with your method, 1/3 of the time B will select between 41 and 60, guaranteeing success. Then the other 2/3 of the time, B will be 0 to 40 and A will be 0 to 40, giving them even odds, so that 1/2 of 2/3 of the time, B will win. That's 1/3 + 1/3 = 2/3, which is different from the expected 3/5 that B wins.
  • WVrock
    WVrock almost 7 years
    @MauveRanger It was never intended to exactly replicate the possibilities but the math seems to be misleading. I will edit and add another method of doing it that will hopefully be correct with the math.
  • Md. Abu Nafee Ibna Zahid
    Md. Abu Nafee Ibna Zahid over 6 years
    Who is @kaushaya? May be s/he has changed name. So it is better to hyperlink the respective answers.
  • Phil Brock
    Phil Brock almost 4 years
    That's a horrible case of inheritance for implementation, but a totally valid point. If you're going to do a lot of look ups and have a lot of entries then TreeMap looks like a fun approach. (You'd probably have to profile it to decide if it actually offered any runtime benefit - I'm not arguing the O log(n) vs O(n) win here but there's a mental overhead in figuring out how TreeMap works and it might not be worth it.) I like it and hate it simultaneously :)
  • Marc Dzaebel
    Marc Dzaebel almost 4 years
    Yes, TreeMap has drawbacks e.g. for iteration and construction. However, if you have a lot of entries, picking should be clearly faster than linear approaches.
  • Marc Dzaebel
    Marc Dzaebel almost 4 years
    I roughly tested the performance of TreeMap picking. It's about 5 times faster as the accepted solution if I pick in the middle of the items. For 100 items TreeMap is 5 times faster, for 10000 items TreeMap is 17 times faster.
  • DonDaniel
    DonDaniel over 3 years
    should Item A be 1 to 25 instead of 0 to 25?
  • Admin
    Admin over 3 years
    I believe it should be 1 to 25 as well, 0 to 25 would mean that Item A has a 26% probability.
  • Huy
    Huy about 2 years
    What will you pick if the random number is exact 0.40?
  • Sufian Latif
    Sufian Latif about 2 years
    @Huy To resolve cases like this, the interval should be closed to the left and open to the right, i.e. if 0.00 <= p < 0.40: pick A, else if 0.40 <= p < 0.70: pick B, ....
  • Marcus
    Marcus about 2 years
    This should have a higher placement than it does, since it's important to be aware of the existince of an O(1) solution.