"isnotnan" functionality in numpy, can this be more pythonic?

115,267

Solution 1

a = a[~np.isnan(a)]

Solution 2

You are currently testing for anything that is not NaN and mtrw has the right way to do this. If you are interested in testing for finite numbers (is not NaN and is not INF) then you don't need an inversion and can use:

np.isfinite(a)

More pythonic and native, an easy read, and often when you want to avoid NaN you also want to avoid INF in my experience.

Just thought I'd toss that out there for folks.

Solution 3

I'm not sure whether this is more or less pythonic...

a = [i for i in a if i is not np.nan]

Solution 4

To get array([ 1., 2.]) from an array arr = np.array([np.nan, 1, 2]) You can do :

 arr[~np.isnan(arr)]

OR

arr[arr == arr] 

(While : np.nan == np.nan is False)

Share:
115,267

Related videos on Youtube

AnalyticsBuilder
Author by

AnalyticsBuilder

Building predictive analytics systems, sinking my teeth into applied statistics.

Updated on October 25, 2020

Comments

  • AnalyticsBuilder
    AnalyticsBuilder over 3 years

    I need a function that returns non-NaN values from an array. Currently I am doing it this way:

    >>> a = np.array([np.nan, 1, 2])
    >>> a
    array([ NaN,   1.,   2.])
    
    >>> np.invert(np.isnan(a))
    array([False,  True,  True], dtype=bool)
    
    >>> a[np.invert(np.isnan(a))]
    array([ 1.,  2.])
    

    Python: 2.6.4 numpy: 1.3.0

    Please share if you know a better way, Thank you

  • Charlie Haley
    Charlie Haley over 8 years
    Note: If you want to use isnotnan for filtering pandas, this is the way to go.
  • Josh D.
    Josh D. about 6 years
    @CharlieHayley wouldn't pd.notnull() be a much better option for pandas?
  • Ezekiel Kruglick
    Ezekiel Kruglick about 6 years
    @JoshD. I checked the code and pd.notnull() is for testing objects instead of numeric values, returning negative if an object in an object array is not an instance of an object. It will be slower than np.isfinite() but is able to handle arbitrary object arrays (e.g. arrays of lists). Neat find, and a good idea if your array might include arbitrary objects. I think if you can be confident your array is generally numeric except for NaN and INF then np.isfinite would be faster, so depends on use case. Thanks for bringing that up, I don't think it was around when the answer was posted.
  • Josh D.
    Josh D. about 6 years
    @EzekielKruglick if the data is already in pandas, not only is pandas actually faster, but it is more functional as well, given that it includes an index you can use to more easily join on: gist.github.com/jaypeedevlin/fdfb88f6fd1031a819f1d46cb36384d‌​a
  • Josh D.
    Josh D. about 6 years
    I think leave it in the comments - the original question is not about pandas.
  • Philip Kahn
    Philip Kahn about 6 years
    @JoshD. that's incorrect, Numpy is faster. I commented on your Gist: gist.github.com/jaypeedevlin/… . Basically, you did it wrong -- you're performing the operation on the Pandas object, rather than doing it on the ndarray. Performing the operation on the ndarray is about 25x faster.
  • Josh D.
    Josh D. about 6 years
    @philipKahn Hmm, looks like I did make an error. I was imagining that numpy would cast to an ndarray before it did the operations, so that .values was unnecessary - live and learn!
  • roganjosh
    roganjosh over 3 years
    It's not appropriate for numpy arrays. Not only do you now get a list back (and thus fundamentally change the nature of the object returned) but this runs in a Python loop and will be orders of magnitude slower than a numpy method. I do not recommend this at all