Distributed probability random number generator

22,711

Solution 1

The general approach is to feed uniformly distributed random numbers from 0..1 interval into the inverse of the cumulative distribution function of your desired distribution.

Thus in your case, just draw a random number x from 0..1 (for example with Random.NextDouble()) and based on its value return

  • 1 if 0 <= x < 150/208,
  • 2 if 150/208 <= x < 190/208,
  • 3 if 190/208 <= x < 205/208 and
  • 4 otherwise.

Solution 2

Do this only once:

  • Write a function that calculates a cdf array given a pdf array. In your example pdf array is [150,40,15,3], cdf array will be [150,190,205,208].

Do this every time:

  • Get a random number in [0,1) , multiply with 208, truncate up (or down: I leave it to you to think about the corner cases) You'll have an integer in 1..208. Name it r.
  • Perform a binary search on cdf array for r. Return the index of the cell that contains r.

The running time will be proportional to log of the size of the given pdf array. Which is good. However, if your array size will always be so small (4 in your example) then performing a linear search is easier and also will perform better.

Solution 3

There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.

One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:

  1. Choose a uniform random integer i in [1, k].
  2. With probability f(i)/max, return i. Otherwise, go to step 1.

Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. These algorithms include the alias method, the Fast Loaded Dice Roller, the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.


The following C# code implements Michael Vose's version of the alias method, as described in this article; see also this question. I have written this code for your convenience and provide it here.

public class LoadedDie {
    // Initializes a new loaded die.  Probs
    // is an array of numbers indicating the relative
    // probability of each choice relative to all the
    // others.  For example, if probs is [3,4,2], then
    // the chances are 3/9, 4/9, and 2/9, since the probabilities
    // add up to 9.
    public LoadedDie(int probs){
        this.prob=new List<long>();
        this.alias=new List<int>();
        this.total=0;
        this.n=probs;
        this.even=true;
    }
    
    Random random=new Random();
    
    List<long> prob;
    List<int> alias;
    long total;
    int n;
    bool even;

    public LoadedDie(IEnumerable<int> probs){
        // Raise an error if nil
        if(probs==null)throw new ArgumentNullException("probs");
        this.prob=new List<long>();
        this.alias=new List<int>();
        this.total=0;
        this.even=false;
        var small=new List<int>();
        var large=new List<int>();
        var tmpprobs=new List<long>();
        foreach(var p in probs){
            tmpprobs.Add(p);
        }
        this.n=tmpprobs.Count;
        // Get the max and min choice and calculate total
        long mx=-1, mn=-1;
        foreach(var p in tmpprobs){
            if(p<0)throw new ArgumentException("probs contains a negative probability.");
            mx=(mx<0 || p>mx) ? P : mx;
            mn=(mn<0 || p<mn) ? P : mn;
            this.total+=p;
        }
        // We use a shortcut if all probabilities are equal
        if(mx==mn){
            this.even=true;
            return;
        }
        // Clone the probabilities and scale them by
        // the number of probabilities
        for(var i=0;i<tmpprobs.Count;i++){
            tmpprobs[i]*=this.n;
            this.alias.Add(0);
            this.prob.Add(0);
        }
        // Use Michael Vose's alias method
        for(var i=0;i<tmpprobs.Count;i++){
            if(tmpprobs[i]<this.total)
                small.Add(i); // Smaller than probability sum
            else
                large.Add(i); // Probability sum or greater
        }
        // Calculate probabilities and aliases
        while(small.Count>0 && large.Count>0){
            var l=small[small.Count-1];small.RemoveAt(small.Count-1);
            var g=large[large.Count-1];large.RemoveAt(large.Count-1);
            this.prob[l]=tmpprobs[l];
            this.alias[l]=g;
            var newprob=(tmpprobs[g]+tmpprobs[l])-this.total;
            tmpprobs[g]=newprob;
            if(newprob<this.total)
                small.Add(g);
            else
                large.Add(g);
        }
        foreach(var g in large)
            this.prob[g]=this.total;
        foreach(var l in small)
            this.prob[l]=this.total;
    }
    
    // Returns the number of choices.
    public int Count {
        get {
            return this.n;
        }
    }
    // Chooses a choice at random, ranging from 0 to the number of choices
    // minus 1.
    public int NextValue(){
        var i=random.Next(this.n);
        return (this.even || random.Next((int)this.total)<this.prob[i]) ? I : this.alias[i];
    }
}

Example:

 var loadedDie=new LoadedDie(new int[]{150,40,15,3}); // list of probabilities for each number:
                                                      // 0 is 150, 1 is 40, and so on
 int number=loadedDie.nextValue(); // return a number from 0-3 according to given probabilities;
                                   // the number can be an index to another array, if needed

I place this code in the public domain.

Solution 4

I know this is an old post, but I also searched for such a generator and was not satisfied with the solutions I found. So I wrote my own and want to share it to the world.

Just call "Add(...)" some times before you call "NextItem(...)"

/// <summary> A class that will return one of the given items with a specified possibility. </summary>
/// <typeparam name="T"> The type to return. </typeparam>
/// <example> If the generator has only one item, it will always return that item. 
/// If there are two items with possibilities of 0.4 and 0.6 (you could also use 4 and 6 or 2 and 3) 
/// it will return the first item 4 times out of ten, the second item 6 times out of ten. </example>
public class RandomNumberGenerator<T>
{
    private List<Tuple<double, T>> _items = new List<Tuple<double, T>>();
    private Random _random = new Random();

    /// <summary>
    /// All items possibilities sum.
    /// </summary>
    private double _totalPossibility = 0;

    /// <summary>
    /// Adds a new item to return.
    /// </summary>
    /// <param name="possibility"> The possibility to return this item. Is relative to the other possibilites passed in. </param>
    /// <param name="item"> The item to return. </param>
    public void Add(double possibility, T item)
    {
        _items.Add(new Tuple<double, T>(possibility, item));
        _totalPossibility += possibility;
    }

    /// <summary>
    /// Returns a random item from the list with the specified relative possibility.
    /// </summary>
    /// <exception cref="InvalidOperationException"> If there are no items to return from. </exception>
    public T NextItem()
    {
        var rand = _random.NextDouble() * _totalPossibility;
        double value = 0;
        foreach (var item in _items)
        {
            value += item.Item1;
            if (rand <= value)
                return item.Item2;
        }
        return _items.Last().Item2; // Should never happen
    }
}
Share:
22,711

Related videos on Youtube

Mark Conway
Author by

Mark Conway

I love windsurfing, snowboarding, surfing, C#, HTML5, CSS3, Javascript, AngularJS Powershell, and SQL Server

Updated on March 26, 2021

Comments

  • Mark Conway
    Mark Conway over 3 years

    I want to generate a number based on a distributed probability. For example, just say there are the following occurences of each numbers:

    Number| Count           
    1    |  150                
    2    |  40          
    3    |  15          
    4    |  3  
    
    with a total of (150+40+15+3) = 208     
    then the probability of a 1 is 150/208= 0.72    
    and the probability of a 2 is 40/208 = 0.192    
    

    How do I make a random number generator that returns be numbers based on this probability distribution?

    I'm happy for this to be based on a static, hardcoded set for now but I eventually want it to derive the probability distribution from a database query.

    I've seen similar examples like this one but they are not very generic. Any suggestions?

  • Adam Zalcman
    Adam Zalcman about 12 years
    If the distribution had indeed a very large number of values, hashtable would be far more efficient than binary search.
  • Ali Ferhat
    Ali Ferhat about 12 years
    @Zalcman Yes, it is possible. However the size of the hashtable equals to the sum of entries in the pdf array, which may be arbitrarily larger than the size of the pdf array. Consider the case when pdf array has a million entries, and the average entry is about 100. Depends on the circumstences, but I'd probably prefer a binary search ( about 20 comparisons per lookup ) to having a hashtable of 100 million entries.
  • Mark Conway
    Mark Conway almost 10 years
    Fantastic! Nice lean solution :) Thanks.
  • Carl
    Carl over 8 years
    Thanks for posting this. It's been a key part of a project I've been working on and I appreciate you placing it in the public domain.
  • Admin
    Admin over 8 years
    Works perfectly. And tweaking the code a little bit allows you to create a seeded Random class.
  • smatthews1999
    smatthews1999 over 6 years
    I am confused as to what that IF statement would look like. Could you please show what this would look like in code (C#, JS, etc)?