Ignoring -Inf values in arrays using numpy/scipy in Python

27,141

Solution 1

Use masked arrays:

>>> a = numpy.array([2, 0, 1.5, -3])
>>> b = numpy.ma.log(a)
>>> b
masked_array(data = [0.69314718056 -- 0.405465108108 --],
             mask = [False  True False  True],
       fill_value = 1e+20)

>>> b.sum()
1.0986122886681096

Solution 2

The easiest way to do this is to use numpy.ma.masked_invalid():

a = numpy.log(numpy.arange(15))
a.sum()
# -inf
numpy.ma.masked_invalid(a).sum()
# 25.19122118273868

Solution 3

Alternative to using masked arrays....

import numpy as np
myarray = np.array([2, 0, 1.5, -3])
mylogarray = np.log(myarray) # The log of negative numbers is nan, 0 is -inf
summed = mylogarray[np.isfinite(mylogarray)].sum() # isfinite will exclude inf and nan
print(f'Sum of logged array is: {summed}')
>>> Sum of logged array is: 1.0986122886681096

Solution 4

maybe you can index your matrix and use:

import numpy as np;
matrix = np.array([[1.,2.,3.,np.Inf],[4.,5.,6.,np.Inf],[7.,8.,9.,np.Inf]]);
print matrix[:,1];
print sum(filter(lambda x: x != np.Inf,matrix[:,1]));
print matrix[1,:];
print sum(filter(lambda x: x != np.Inf,matrix[1,:]));

Solution 5

Use a filter():

>>> array
array([  1.,   2.,   3., -Inf])
>>> sum(filter(lambda x: x != float('-inf'), array))
6.0
Share:
27,141
Admin
Author by

Admin

Updated on October 05, 2020

Comments

  • Admin
    Admin over 3 years

    I have an NxM array in numpy that I would like to take the log of, and ignore entries that were negative prior to taking the log. When I take the log of negative entries, it returns -Inf, so I will have a matrix with some -Inf values as a result. I then want to sum over the columns of this matrix, but ignoring the -Inf values -- how can I do this?

    For example,

    mylogarray = log(myarray)
    # take sum, but ignore -Inf?
    sum(mylogarray, 0)
    

    I know there's nansum and I need the equivalent, something like infsum.

    Thanks.

  • Admin
    Admin over 13 years
    Is this considered a vectorized operation? Is there a more efficient way? I need to do this many times in my code and wanted a vectorized approach
  • moinudin
    moinudin over 13 years
    Are you asking if this is done in-place with iterators? No. Is there a more efficient way? AFAIK, you'd have to loop through the array as there's no filter function that returns an iterator, unless you write one.
  • Admin
    Admin over 13 years
    I don't think the filter code works for NxM arrays.. it seems to onlyu work for 1xM vectors.
  • Admin
    Admin over 13 years
    can you please expand on this? I don't understand the example. How did you initialize the masked array above?
  • Joe Kington
    Joe Kington over 13 years
    @user248237 - The numpy.ma.log, etc, functions will automatically create a masked array where anything that results in a inf or nan is masked. This is a bit less explicit, though, so you can instead do this: a = np.ma.masked_where(a == np.inf, a), and then just do b = np.log(a) (or any other function). Alternatively, you can avoid masked arrays and just do np.log(a[a != np.inf]).sum() (You can index by boolean arrays, it's much cleaner and faster than the filter-based answers.)
  • Joe Kington
    Joe Kington over 13 years
    The "numpythonic" way to do filter(lambda x: x != float('-inf'), array) is just x[x != np.inf] Using list comprehensions, filter, etc, is much slower on numpy arrays than it is on lists. Because of that, numpy arrays have a number of facilities to avoid explicitly looping through and operating on each element.
  • Philipp
    Philipp over 13 years
    @user248237 I didn't initialize the masked array explicitly. a is just a normal, non-masked array. ma.log masks all values where the (real) logarithm is undefined. Then the resulting masked array b is treated roughly as if the masked entries weren't there.
  • kilojoules
    kilojoules about 8 years
    I got AttributeError: 'SingleBlockManager' object has no attribute 'log'
  • Qin Heyang
    Qin Heyang almost 3 years
    In case you still have sum results inf even with mask, try change the dtype to np.float64
  • igorkf
    igorkf almost 3 years
    This helped me to filter out np.inf from mean calculations...thanks!