numpy 1.9.0: ValueError: probabilities do not sum to 1
I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:
>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([ 1.37635054, 1.1287515 , 1.7229892 , 19.8967587 , 2.07953181])
Gabriel
Updated on June 11, 2022Comments
-
Gabriel almost 2 years
I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).
To do this I use the numpy.random.choice which worked just fine until
numpy 1.8.0
. Here's a MWE (the filepdf_probs.txt
can be downloaded here):import simplejson import numpy as np # Read probabilities from file. f = open('pdf_probs.txt', 'r') probs = simplejson.load(f) f.close() print sum(probs) # <-- Not *exactly* 1. but very close: 1.00000173042 # Define array. arr = np.linspace(1., 100., len(probs)) # Get samples using the probabilities in probs. samples = np.random.choice(arr, size=1000, replace=True, p=probs)
The thing is that after testing it with
numpy 1.9.0
the above code fails with the error:Traceback (most recent call last): File "numpy_180_vs_190_np_random_choice.py", line 13, in <module> samples = np.random.choice(arr, size=1000, replace=True, p=probs) File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106) ValueError: probabilities do not sum to 1
The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.
From what I can gather the previous version of
numpy
(1.8.0
) apparently had a larger tolerance than the new1.9.0
version, but I could be wrong.Why does this work with
numpy 1.8.0
but not with1.9.0
? How can I make my code work with the new1.9.0
version? -
Gabriel over 9 yearsThanks @DSM, that's a very simple solution that I didn't think of. Do you have any idea what changed from
1.8.0
to1.9.0
to make the code no longer work? -
jrubins almost 9 yearsThis isn't working for me, my probabilities are large integers. When I go through this step
robs /= probs.sum()
it just creates an array of 0's, so my sum() is zero -
wflynny almost 9 years@jrubins That's a result of integer division. If you do
probs /= probs.sum().astype(float)
, you should be fine. -
ldmtwo almost 5 yearsJust a note for anyone still having trouble. Similar to above, set the data type (dtype) of the source array to np.float64, not 32bit float and obviously not int. With 32 bit float, you can have an error of 1e-7 when you normalize (divide by the sum). This is large enough of an error to cause numpy to raise the exception.
-
E. Körner about 3 yearsFor those with nested arrays, like
np.array([[0.4, 0.5], [0.3, 0.7]])
, axes have to be used to broadcast and compute it correctly:probs /= probs.sum(axis=1).astype(float)[:, np.newaxis]
I just wanted to add this, as I had to search and test some more to get it to work in my code.