numpy 1.9.0: ValueError: probabilities do not sum to 1

12,379

I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:

>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([  1.37635054,   1.1287515 ,   1.7229892 ,  19.8967587 ,   2.07953181])
Share:
12,379
Gabriel
Author by

Gabriel

Updated on June 11, 2022

Comments

  • Gabriel
    Gabriel almost 2 years

    I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).

    To do this I use the numpy.random.choice which worked just fine until numpy 1.8.0. Here's a MWE (the file pdf_probs.txt can be downloaded here):

    import simplejson
    import numpy as np
    
    # Read probabilities from file.
    f = open('pdf_probs.txt', 'r')
    probs = simplejson.load(f)
    f.close()
    
    print sum(probs)  # <-- Not *exactly* 1. but very close: 1.00000173042
    # Define array.
    arr = np.linspace(1., 100., len(probs))
    
    # Get samples using the probabilities in probs.
    samples = np.random.choice(arr, size=1000, replace=True, p=probs)
    

    The thing is that after testing it with numpy 1.9.0 the above code fails with the error:

    Traceback (most recent call last):
      File "numpy_180_vs_190_np_random_choice.py", line 13, in <module>
        samples = np.random.choice(arr, size=1000, replace=True, p=probs)
      File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106)
    ValueError: probabilities do not sum to 1
    

    The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.

    From what I can gather the previous version of numpy (1.8.0) apparently had a larger tolerance than the new 1.9.0 version, but I could be wrong.

    Why does this work with numpy 1.8.0 but not with 1.9.0? How can I make my code work with the new 1.9.0 version?

  • Gabriel
    Gabriel over 9 years
    Thanks @DSM, that's a very simple solution that I didn't think of. Do you have any idea what changed from 1.8.0 to 1.9.0 to make the code no longer work?
  • jrubins
    jrubins almost 9 years
    This isn't working for me, my probabilities are large integers. When I go through this step robs /= probs.sum() it just creates an array of 0's, so my sum() is zero
  • wflynny
    wflynny almost 9 years
    @jrubins That's a result of integer division. If you do probs /= probs.sum().astype(float), you should be fine.
  • ldmtwo
    ldmtwo almost 5 years
    Just a note for anyone still having trouble. Similar to above, set the data type (dtype) of the source array to np.float64, not 32bit float and obviously not int. With 32 bit float, you can have an error of 1e-7 when you normalize (divide by the sum). This is large enough of an error to cause numpy to raise the exception.
  • E. Körner
    E. Körner about 3 years
    For those with nested arrays, like np.array([[0.4, 0.5], [0.3, 0.7]]), axes have to be used to broadcast and compute it correctly: probs /= probs.sum(axis=1).astype(float)[:, np.newaxis] I just wanted to add this, as I had to search and test some more to get it to work in my code.