numpy 1.9.0: ValueError: probabilities do not sum to 1

python arrays numpy random floating-accuracy

12,379

I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:

>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([  1.37635054,   1.1287515 ,   1.7229892 ,  19.8967587 ,   2.07953181])

12,379

Author by

Gabriel

Updated on June 11, 2022

Comments

Gabriel almost 2 years
I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).

To do this I use the numpy.random.choice which worked just fine until numpy 1.8.0. Here's a MWE (the file pdf_probs.txt can be downloaded here):
```
import simplejson
import numpy as np

# Read probabilities from file.
f = open('pdf_probs.txt', 'r')
probs = simplejson.load(f)
f.close()

print sum(probs)  # <-- Not *exactly* 1. but very close: 1.00000173042
# Define array.
arr = np.linspace(1., 100., len(probs))

# Get samples using the probabilities in probs.
samples = np.random.choice(arr, size=1000, replace=True, p=probs)
```
The thing is that after testing it with numpy 1.9.0 the above code fails with the error:
```
Traceback (most recent call last):
  File "numpy_180_vs_190_np_random_choice.py", line 13, in <module>
    samples = np.random.choice(arr, size=1000, replace=True, p=probs)
  File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106)
ValueError: probabilities do not sum to 1
```
The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.

From what I can gather the previous version of numpy (1.8.0) apparently had a larger tolerance than the new 1.9.0 version, but I could be wrong.

Why does this work with numpy 1.8.0 but not with 1.9.0? How can I make my code work with the new 1.9.0 version?
Gabriel over 9 years

Thanks @DSM, that's a very simple solution that I didn't think of. Do you have any idea what changed from 1.8.0 to 1.9.0 to make the code no longer work?
jrubins almost 9 years

This isn't working for me, my probabilities are large integers. When I go through this step robs /= probs.sum() it just creates an array of 0's, so my sum() is zero
wflynny almost 9 years

@jrubins That's a result of integer division. If you do probs /= probs.sum().astype(float), you should be fine.
ldmtwo almost 5 years

Just a note for anyone still having trouble. Similar to above, set the data type (dtype) of the source array to np.float64, not 32bit float and obviously not int. With 32 bit float, you can have an error of 1e-7 when you normalize (divide by the sum). This is large enough of an error to cause numpy to raise the exception.
E. Körner about 3 years

For those with nested arrays, like np.array([[0.4, 0.5], [0.3, 0.7]]), axes have to be used to broadcast and compute it correctly: probs /= probs.sum(axis=1).astype(float)[:, np.newaxis] I just wanted to add this, as I had to search and test some more to get it to work in my code.