np.random.choice: probabilities do not sum to 1
Solution 1
This is a known issue with numpy. The random choice function checks for the sum of the probabilities using a given tolerance (here the source)
The solution is to normalize the probabilities by dividing them by their sum if the sum is close enough to 1
Example:
>>> p=[ 1.42836755e-01, 1.42836735e-01 , 1.42836735e-01, 1.42836735e-01
, 4.76122449e-05, 1.42836735e-01 , 4.76122449e-05 , 1.42836735e-01,
1.42836735e-01, 4.79122449e-05]
>>> sum(p)
1.0000003017347 # over tolerance limit
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
File "mtrand.pyx", line 1417, in mtrand.RandomState.choice (numpy\random\mtrand\mtrand.c:15985)
ValueError: probabilities do not sum to 1
With normalization:
>>> p = np.array(p)
>>> p /= p.sum() # normalize
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([8, 4, 1, 6])
Solution 2
Convert it to float64:
p = np.asarray(p).astype('float64')
p = p / np.sum(p)
np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
This was inspired by another post: How can I avoid value errors when using numpy.random.multinomial?
Solution 3
One way to see the difference is:
numpy.set_printoptions(precision=15)
print(p)
This will perhaps show you that your 4.17187500e-05
is actually 4.17187500005e-05
. See the manual here.
Solution 4
ValueError: probabilities do not sum to 1
This is a known numpy bug. This error happens when numpy can’t handle float operations precise enough. Sometimes, probabilities will sum to something like 0.9999999999997 or 1.0000000000003. They will break np.random.choice().
There is a workaround: np.random.multinomial(). This method handles probabilities more elegantly without the need to be exactly 1.0.
pvals : sequence of floats, length p Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).
For example, I have some choices and normalized_weights associated with the choices.
np.random.multinomial() choose 20 times based on the normalized_weights and returns how many times each choice is chosen.
choices = [......]
weights = np.array([......])
normalized_weights = weights / np.sum(weights)
number_of_choices = 20
resample_counts = np.random.multinomial(number_of_choices,
normalized_weights)
chosen = []
resample_index = 0
for resample_count in resample_counts:
for _ in range(resample_count):
chosen.append(choices[resample_index])
resample_index += 1
![pd shah](https://lh5.googleusercontent.com/-4TGPQgbHS4E/AAAAAAAAAAI/AAAAAAAAAB8/XS1kM6VkDlM/photo.jpg?sz=256)
pd shah
Updated on June 07, 2022Comments
-
pd shah about 2 years
how can I use np.random.choice here? there is
p
that calculate by some opertation, like :p=[ 1.42836755e-01, 1.42836735e-01 , 1.42836735e-01, 1.42836735e-01 , 4.76122449e-05, 1.42836735e-01 , 4.76122449e-05 , 1.42836735e-01, 1.42836735e-01, 4.76122449e-05]
usually sum p is not exact equal to 1:
>>> sum(p) 1.0000000017347
I want to make random choice by probabilities=p:
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False) array([4, 3, 2, 9])
this work here! but in the program it has an error :
Traceback (most recent call last): indexs=np.random.choice(range(len(population)), population_number, p=p, replace=False) File "mtrand.pyx", line 1141, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17808) ValueError: probabilities do not sum to 1
if I print the
p
:[ 4.17187500e-05 2.49937500e-01 4.16562500e-05 4.16562500e-05 2.49937500e-01 4.16562500e-05 4.16562500e-05 4.16562500e-05 2.49937500e-01 2.49937500e-01]
but it works, in python shell by this
p
:>>> p=[ 4.17187500e-05 , 2.49937500e-01 ,4.16562500e-05 , 4.16562500e-05, 2.49937500e-01 , 4.16562500e-05 , 4.16562500e-05 , 4.16562500e-05, 2.49937500e-01 ,2.49937500e-01] >>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False) array([ 9, 10, 2, 5])
UPDATE I have tested it by precision=15:
np.set_printoptions(precision=15) print(p) [ 2.499375625000002e-01 2.499375000000000e-01 2.499375000000000e-01 4.165625000000000e-05 4.165625000000000e-05 4.165625000000000e-05 4.165625000000000e-05 4.165625000000000e-05 2.499375000000000e-01 4.165625000000000e-05]
testing:
>>> p=np.array([ 2.499375625000002e-01 ,2.499375000000000e-01 ,2.499375000000000e-01, 4.165625000000000e-05 ,4.165625000000000e-05, 4.165625000000000e-05, 4.165625000000000e-05 , 4.165625000000000e-05 , 2.499375000000000e-01, 4.165625000000000e-05]) >>> np.sum(p) 1.0000000000000002
how fix this to use np.random.choice ?
-
pd shah over 6 yearsthx. I added more comment on the post. how to fix this problem ?
-
pd shah over 6 yearsthx but dosenot work. ValueError: probabilities do not sum to 1. what to do ?
-
user2314737 over 6 years@pdshah have you tried normalizing the probabilities by
p /= p.sum()
? -
pd shah over 6 yearsyes: >>> p=np.array([0.1999600079984003, 0.1999600079984003, 0.1999600079984003, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05, 0.1999600079984003, 3.9992001599680064e-05, 3.9992001599680064e-05]) >>> np.sum(p) 0.99999999999999978 >>> p /= p.sum() >>> np.sum(p) 1.0000000000000002
-
user2314737 over 6 years@pdshah ok the sum is still not exactly one, but does
np.random.choice
work? -
Soid over 3 yearsIt won't always add up
-
Michael Tamillow almost 3 yearsFirst thing I thought to do as well. but it did not work
-
Fırat Kıyak over 2 yearsThis may not work due to round-off errors accumulated due to division. See my answer at stackoverflow.com/a/71400320/6087087 for a definitive solution.