Generate random number with given probability matlab

36,223

Solution 1

The simple solution is to generate a number with a uniform distribution (using rand), and manipulate it a bit:

r = rand;
prob = [0.5, 0.1, 0.4];
x = sum(r >= cumsum([0, prob]));

or in a one-liner:

x = sum(rand >= cumsum([0, 0.5, 0.1, 0.4]));

Explanation

Here r is a uniformly distributed random number between 0 and 1. To generate an integer number between 1 and 3, the trick is to divide the [0, 1] range into 3 segments, where the length of each segment is proportional to its corresponding probability. In your case, you would have:

  • Segment [0, 0.5), corresponding to number 1.
  • Segment [0.5, 0.6), corresponding to number 2.
  • Segment [0.6, 1], corresponding to number 3.

The probability of r falling within any of the segments is proportional to the probabilities you want for each number. sum(r >= cumsum([0, prob])) is just a fancy way of mapping an integer number to one of the segments.

Extension

If you're interested in creating a vector/matrix of random numbers, you can use a loop or arrayfun:

r = rand(3); % # Any size you want
x = arrayfun(@(z)sum(z >= cumsum([0, prob])), r);

Of course, there's also a vectorized solution, I'm just too lazy to write it.

Solution 2

The answers so far are correct, but slow for large inputs: O(m*n) where n is the number of values and m is the number of random samples. Here is a O(m*log(n)) version that takes advantage of monotonicity of the cumsum result and the binary search used in histc:

% assume n = numel(prob) is large and sum(prob) == 1
r = rand(m,1);
[~,x] = histc(r,cumsum([0,prob]));

Solution 3

>> c = cumsum([0.5, 0.1, 0.4]);
>> r = rand(1e5, 1);
>> x = arrayfun(@(x) find(x <= c, 1, 'first'), r);
>> h = hist(x, 1:3)

h =

       49953       10047       40000

x distributed as desired.

Solution 4

using randsample function from Statistics and Machine Learning Toolbox, you can generate random numbers with specified probability mass function (pmf):

pmf = [0.5, 0.1, 0.4];
population = 1:3;
sample_size = 1;

random_number = randsample(population,sample_size,true,pmf);

I think this is the easiest method.

Solution 5

A slightly more general solution would be:

r=rand;
prob=[.5,.1,.4];
prob=cumsum(prob);
value=[1,2,3];    %values corresponding to the probabilities
ind=find(r<=prob,1,'first');
x=value(ind)
Share:
36,223

Related videos on Youtube

Eamonn McEvoy
Author by

Eamonn McEvoy

Updated on December 31, 2021

Comments

  • Eamonn McEvoy
    Eamonn McEvoy over 2 years

    I want to generate a random number with a given probability but I'm not sure how to:

    I need a number between 1 and 3

    num = ceil(rand*3);
    

    but I need different values to have different probabilities of generating eg.

    0.5 chance of 1
    0.1 chance of 2
    0.4 chance of 3
    

    I'm sure this is straightforward but I can't think of how to do it.

  • Eamonn McEvoy
    Eamonn McEvoy over 11 years
    Thanks for your help, is there something I can do when the probabilities are [0,0,1] in this case I need the answer to be 3, but keep getting 1 ?
  • Eamonn McEvoy
    Eamonn McEvoy over 11 years
    PS: You forgot to add the 0 on this line -> x = sum(rand >= cumsum([0.5, 0.1, 0.4));
  • Eitan T
    Eitan T over 11 years
    @EamonnMcEvoy Thanks, fixed.
  • Serg
    Serg over 11 years
    @EitanT, I don't think that sum() is faster than find(..., 'first'). Also, there is no need in adding zero. Please test it. In general case I would only add: assert(c(end) == 1);
  • Eitan T
    Eitan T over 11 years
    Now that I think about it, my comment is out of place. My apologies.
  • Alec Jacobson
    Alec Jacobson over 10 years
    Small note, depending on how histc is implemented this could be O(n+m*log(n)). Though I'd hope since the first output is not used, this isn't the case.
  • Alec Jacobson
    Alec Jacobson about 10 years
    There's an even better O(n + m) solution using the Alias Method. I've implemented it in the sample_discrete.m function.
  • Oleg
    Oleg over 9 years
    Vectorized solution: sum(bsxfun(@ge, r, cumsum([0, prob]),2) where r is a column vector and prob a row vector.
  • Eitan T
    Eitan T over 9 years
    @OlegKomarov Thanks for the vectorized solution ;)
  • Eitan T
    Eitan T over 9 years
    @yashar Thanks, fixed.
  • intrepid_em
    intrepid_em over 5 years
    The link is broken, fyi.