Random number with Probabilities

74,540

Solution 1

Yours is a pretty good way already and works well with any range.

Just thinking: another possibility is to get rid of the fractions by multiplying with a constant multiplier, and then build an array with the size of this multiplier. Multiplying by 10 you get

P(1) = 2
P(2) = 3
P(3) = 5

Then you create an array with the inverse values -- '1' goes into elements 1 and 2, '2' into 3 to 6, and so on:

P = (1,1, 2,2,2, 3,3,3,3,3);

and then you can pick a random element from this array instead.


(Add.) Using the probabilities from the example in kiruwka's comment:

int[] numsToGenerate           = new int[]    { 1,   2,    3,   4,    5   };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };

the smallest multiplier that leads to all-integers is 20, which gives you

2, 5, 6, 5, 2

and so the length of numsToGenerate would be 20, with the following values:

1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5

The distribution is exactly the same: the chance of '1', for example, is now 2 out of 20 -- still 0.1.

This is based on your original probabilities all adding up to 1. If they do not, multiply the total by this same factor (which is then going to be your array length as well).

Solution 2

Some time ago I wrote a helper class to solve this issue. The source code should show the concept clear enough:

public class DistributedRandomNumberGenerator {

    private Map<Integer, Double> distribution;
    private double distSum;

    public DistributedRandomNumberGenerator() {
        distribution = new HashMap<>();
    }

    public void addNumber(int value, double distribution) {
        if (this.distribution.get(value) != null) {
            distSum -= this.distribution.get(value);
        }
        this.distribution.put(value, distribution);
        distSum += distribution;
    }

    public int getDistributedRandomNumber() {
        double rand = Math.random();
        double ratio = 1.0f / distSum;
        double tempDist = 0;
        for (Integer i : distribution.keySet()) {
            tempDist += distribution.get(i);
            if (rand / ratio <= tempDist) {
                return i;
            }
        }
        return 0;
    }

}

The usage of the class is as follows:

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.3d); // Adds the numerical value 1 with a probability of 0.3 (30%)
// [...] Add more values

int random = drng.getDistributedRandomNumber(); // Generate a random number

Test driver to verify functionality:

    public static void main(String[] args) {
        DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
        drng.addNumber(1, 0.2d);
        drng.addNumber(2, 0.3d);
        drng.addNumber(3, 0.5d);

        int testCount = 1000000;

        HashMap<Integer, Double> test = new HashMap<>();

        for (int i = 0; i < testCount; i++) {
            int random = drng.getDistributedRandomNumber();
            test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);
        }

        System.out.println(test.toString());
    }

Sample output for this test driver:

{1=0.20019100000017953, 2=0.2999349999988933, 3=0.4998739999935438}

Solution 3

You already wrote the implementation in your question. ;)

final int ran = myRandom.nextInt(100);
if (ran > 50) { return 3; }
else if (ran > 20) { return 2; } 
else { return 1; }

You can speed this up for more complex implementations by per-calculating the result on a switch table like this:

t[0] = 1; t[1] = 1; // ... one for each possible result
return t[ran];

But this should only be used if this is a performance bottleneck and called several hundred times per second.

Solution 4

If you have performance issue instead of searching all the n values O(n)

you could perform binary search which costs O(log n)

Random r=new Random();      
double[] weights=new double[]{0.1,0.1+0.2,0.1+0.2+0.5};
// end of init
double random=r.nextDouble();
// next perform the binary search in weights array

you only need to access log2(weights.length) in average if you have a lot of weights elements.

Solution 5

Your approach is fine for the specific numbers you picked, although you could reduce storage by using an array of 10 instead of an array of 100. However, this approach doesn't generalize well to large numbers of outcomes or outcomes with probabilities such as 1/e or 1/PI.

A potentially better solution is to use an alias table. The alias method takes O(n) work to set up the table for n outcomes, but then is constant time to generate regardless of how many outcomes there are.

Share:
74,540
marc wellman
Author by

marc wellman

Updated on June 06, 2020

Comments

  • marc wellman
    marc wellman almost 4 years

    I am wondering what would be the best way (e.g. in Java) to generate random numbers within a particular range where each number has a certain probability to occur or not?

    e.g.

    Generate random integers from within [1;3] with the following probabilities:

    P(1) = 0.2
    P(2) = 0.3
    P(3) = 0.5


    Right now I am considering the approach to generate a random integer within [0;100] and do the following:

    If it is within [0;20] --> I got my random number 1.
    If it is within [21;50] --> I got my random number 2.
    If it is within [51;100] --> I got my random number 3.

    What would you say?

  • marc wellman
    marc wellman over 10 years
    Thank you very much for your answer on that problem - your help is pretty much appreciated.
  • marc wellman
    marc wellman over 10 years
    Thank you very much :) You helped me a lot.
  • marc wellman
    marc wellman over 10 years
    Your answer helped me a lot. Thank you very much.
  • xeruf
    xeruf over 6 years
    I like that! If you want to use it on a large scale tho the hashmap should use Float instead of Double to reduce unneccessary overhead
  • user366312
    user366312 about 4 years
    Could you kindly explain the for-loop in the main()? I don't understand what it is doing. Also, why are you not checking the distSum to be 1 before doing the calculation?
  • user366312
    user366312 about 4 years
    What are you doing with this: if (this.distribution.get(value) != null) { distSum -= this.distribution.get(value); } ?
  • trylimits
    trylimits about 4 years
    @user366312 If addNumber(int value, ...) is called multiple times with the same value this line ensures that the sum distSum holds the correct value.
  • noobie
    noobie about 2 years
    Why test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount); needed? What does 1d / testCount achieve? And can you please explain what is the logic behind of this code, what is it named if I want to search about it? (like inverse cumulative distribution, etc.?) I couldn't get how is it serving its job..
  • trylimits
    trylimits about 2 years
    @noobie The term (1d / testCount) is used for calculating the average of the test driver. A different, but probably more understandable way of doing this, would be to count each random number and divide it by testcount. I don't know if this algorithm has a dedicated name. I implemented this class to use it as Roulette Wheel Selection - probably that's the name you are looking for.