Python: Selecting numbers with associated probabilities

17,453

Solution 1

This might be what you're looking for. Extension to a solution in Generate random numbers with a given (numerical) distribution. Removes the selected item from the distribution, updates the probabilities and returns selected item, updated distribution. Not proven to work, but should give a good impression of the idea.

def random_distr(l):
    assert l # don't accept empty lists
    r = random.uniform(0, 1)
    s = 0
    for i in xrange(len(l)):
        item, prob = l[i]
        s += prob
        if s >= r:
            l.pop(i) # remove the item from the distribution
            break
    else: # Might occur because of floating point inaccuracies
        l.pop()
    # update probabilities based on new domain
    d = 1 - prob 
    for i in xrange(len(l)):
        l[i][1] /= d
    return item, l

dist = [[1, 0.5], [2, 0.25], [3, 0.05], [4, 0.01], [5, 0.09], [6, 0.1]]
while dist:
    val, dist = random_distr(dist)
    print val

Solution 2

I'm going to assume the probabilities all add up to 1. If they don't, you're going to have to scale them accordingly so that they do.

First generate a uniform random variable [0, 1] using random.random(). Then pass through the list, summing the probabilities. The first time the sum exceeds the random number, return the associated number. This way, if the uniform random variable generated falls within the range (0.5, 0.75] in your example, 2 will be returned, thus giving it the required 0.25 probability of being returned.

import random
import sys
def pick_random(prob_list):
  r, s = random.random(), 0
  for num in prob_list:
    s += num[1]
    if s >= r:
      return num[0]
  print >> sys.stderr, "Error: shouldn't get here"

Here's a test showing it works:

import collections
count = collections.defaultdict(int)
for i in xrange(10000):
  count[pick_random(prob_list)] += 1
for n in count:
  print n, count[n] / 10000.0

which outputs:

1 0.498
2 0.25
3 0.0515
4 0.0099
5 0.0899
6 0.1007

EDIT: Just saw the edit in the question. If you want to select two distinct numbers, you can repeat the above until your second number chosen is distinct. But this will be terribly slow if one number has a very high (e.g. 0.99999999) probability associated with it. In this case, you could remove the first number from the list and rescale the probabilities so that they sum to 1 before selecting the second number.

Solution 3

Here's something that appears to work and meet all your specifications (and subjectively it seems pretty fast). Note that your constraint that the second number not be the same as the first throws the probabilities off for selecting it. That issue is effectively ignored by the code below and it just enforces the restriction (in other words the probability of what the second number is won't be that given for each number in the prob_list).

import random

prob_list = [[1, 0.5], [2, 0.25], [3, 0.05], [4, 0.01], [5, 0.09], [6, 0.1]]

# create a list with the running total of the probabilities
acc = 0.0
acc_list = [acc]
for t in prob_list:
    acc += t[1]
    acc_list.append(acc)

TOLERANCE = .000001
def approx_eq(v1, v2):
    return abs(v1-v2) <= TOLERANCE

def within(low, value, high):
    """ Determine if low >= value <= high (approximately) """
    return (value > low or approx_eq(low, value)) and \
           (value < high or approx_eq(high, value))

def get_selection():
    """ Find which weighted interval a random selection falls in """
    interval = -1
    rand = random.random()
    for i in range(len(acc_list)-1):
        if within(acc_list[i], rand, acc_list[i+1]):
            interval = i
            break
    if interval == -1:
        raise AssertionError('no interval for {:.6}'.format(rand))
    return interval

def get_two_different_nums():
    sel1 = get_selection()
    sel2 = sel1
    while sel2 == sel1:
        sel2 = get_selection()
    return prob_list[sel1][0], prob_list[sel2][0]

Solution 4

Maybe the problem is just related to the data structure. It would be easier if you had a dictionary instead of a list of lists:

prob_list = { 1:0.5, 2:0.25, 3:0.05, 4:0.01, 5:0.09, 6:0.1}

This way, you can obtain the weight corresponding to the number:

import random
number = weight = -1
while not( number in prob_list ):
    number = random.randint( 0, length( prob_list ) )
    weight = prob_list[ number ]

print( number, weight )
Share:
17,453
Harpal
Author by

Harpal

https://www.drdatascience.co.uk/

Updated on June 14, 2022

Comments

  • Harpal
    Harpal almost 2 years

    Possible Duplicates:
    Random weighted choice
    Generate random numbers with a given (numerical) distribution

    I have a list of list which contains a series on numbers and there associated probabilities.

    prob_list = [[1, 0.5], [2, 0.25], [3, 0.05], [4, 0.01], [5, 0.09], [6, 0.1]]
    

    for example in prob_list[0] the number 1 has a probability of 0.5 associated with it. So you would expect 1 to show up 50% of the time.

    How do I add weight to the numbers when I select them?

    NOTE: the amount of numbers in the list can vary from 6 - 100


    EDIT

    In the list I have 6 numbers with their associated probabilities. I want to select two numbers based on their probability.

    No number can be selected twice. If "2" is selected it can not be selected again.

    • khachik
      khachik over 13 years
    • robert
      robert over 13 years
      Are you trying to generate random numbers? Calculate expected value?
    • SubniC
      SubniC over 13 years
      Hi, i don't understand the question... what would you like to do with the numbers?
    • Madison Caldwell
      Madison Caldwell over 13 years
      Again, please clarify what exactly you are asking and I would be glad to help.
    • Harpal
      Harpal over 13 years
      I want to select the numbers in this list from 1- 6 based on their probability.
    • Lucas Moeskops
      Lucas Moeskops over 13 years
    • Matt Ellen
      Matt Ellen over 13 years
      @Harpal, so from your given list, you always want to select 1 and 2 (highest two probabilities)?
    • Jimmy Kane
      Jimmy Kane over 10 years