selection based on percentage weighting

24,956

Solution 1

Here is a complete solution in C#:

public class ProportionValue<T>
{
    public double Proportion { get; set; }
    public T Value { get; set; }
}

public static class ProportionValue
{
    public static ProportionValue<T> Create<T>(double proportion, T value)
    {
        return new ProportionValue<T> { Proportion = proportion, Value = value };
    }

    static Random random = new Random();
    public static T ChooseByRandom<T>(
        this IEnumerable<ProportionValue<T>> collection)
    {
        var rnd = random.NextDouble();
        foreach (var item in collection)
        {
            if (rnd < item.Proportion)
                return item.Value;
            rnd -= item.Proportion;
        }
        throw new InvalidOperationException(
            "The proportions in the collection do not add up to 1.");
    }
}

Usage:

var list = new[] {
    ProportionValue.Create(0.7, "a"),
    ProportionValue.Create(0.2, "b"),
    ProportionValue.Create(0.1, "c")
};

// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());

Solution 2

For Python:

>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
... 
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a a a a c c a c a a c a
>>> 

General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.

Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).

Solution 3

Knuth references Walker's method of aliases. Searching on this, I find http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ and http://prxq.wordpress.com/2006/04/17/the-alias-method/. This gives the exact probabilities required in constant time per number generated with linear time for setup (curiously, n log n time for setup if you use exactly the method Knuth describes, which does a preparatory sort you can avoid).

Solution 4

Take the list of and find the cumulative total of the weights: 70, 70+20, 70+20+10. Pick a random number greater than or equal to zero and less than the total. Iterate over the items and return the first value for which the cumulative sum of the weights is greater than this random number:

def select( values ):
    variate = random.random() * sum( values.values() )
    cumulative = 0.0
    for item, weight in values.items():
        cumulative += weight
        if variate < cumulative:
            return item
    return item # Shouldn't get here, but just in case of rounding...

print select( { "a": 70, "b": 20, "c": 10 } )

This solution, as implemented, should also be able to handle fractional weights and weights that add up to any number so long as they're all non-negative.

Solution 5

  1. Let T = the sum of all item weights
  2. Let R = a random number between 0 and T
  3. Iterate the item list subtracting each item weight from R and return the item that causes the result to become <= 0.
Share:
24,956
Corey Goldberg
Author by

Corey Goldberg

"Outside of a dog, a book is a man's best friend. Inside of a dog, it's too dark to read."

Updated on May 28, 2020

Comments

  • Corey Goldberg
    Corey Goldberg almost 4 years

    I have a set of values, and an associated percentage for each:

    a: 70% chance
    b: 20% chance
    c: 10% chance

    I want to select a value (a, b, c) based on the percentage chance given.

    how do I approach this?


    my attempt so far looks like this:

    r = random.random()
    if r <= .7:
        return a
    elif r <= .9:
        return b
    else: 
        return c
    

    I'm stuck coming up with an algorithm to handle this. How should I approach this so it can handle larger sets of values without just chaining together if-else flows.


    (any explanation or answers in pseudo-code are fine. a python or C# implementation would be especially helpful)