Python: sorting an array with NaNs

14,776

Solution 1

Not sure if it can be done with numpy.sort, but you can use numpy.argsort for sure:

>>> arr
array([[ 105.,    4.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan],
       [  22.,   10.],
       [ 104.,   26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105.,    4.],
       [  22.,   10.],
       [ 104.,   26.],
       [  53.,  520.],
       [ 745.,  902.],
       [  19.,   nan],
       [ 184.,   nan]])

Solution 2

You can create a masked array:

a = np.loadtxt('test.txt')

mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)

And then sort a using the masked array:

a[np.argsort(ma[:, 1])]

Solution 3

If you're using an older version of numpy and don't want to upgrade (or if you want code that supports older versions of numpy) you can do:

import numpy as np

def nan_argsort(a):
    temp = a.copy()
    temp[np.isnan(a)] = np.inf
    return temp.argsort()

sorted = a[nan_argsort(a[:, 1])]

In newer versions of numpy, at least 1.6 I think, numpy's sort/argsort already has this behavior. If you need to use python's sort for some reason, you can make your own compare function as described in the other answers.

Solution 4

You can use comparision function

def cmpnan(x, y):
    if isnan(x[1]):
        return 1 # x is "larger"
    elif isnan(y[1]):
        return -1 # x is "smaller"
    else:
        cmp(x[1], y[1]) # compare numbers

sorted(data, cmp=cmpnan)

see http://docs.python.org/2.7/library/functions.html#sorted

Share:
14,776

Related videos on Youtube

user3207120
Author by

user3207120

Updated on September 14, 2022

Comments

  • user3207120
    user3207120 over 1 year

    Note: I'm using Python and numpy arrays.

    I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.

    I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.

    I'd like it to sort like so:

    105.  4.
    22.   10.
    104.  26.
    ...
    ...
    ...
    53.   520.
    745.  902.
    184.  nan
    19.   nan
    

    First I tried using fix_invalid which converts the NaNs into 1x10^20:

    #data.txt has one of the arrays with 2 columns and a bunch of rows.
    Data_0_30 = array(genfromtxt(fname='data.txt'))
    
    g = open("iblah.txt", "a") #saves to file
    
    def Sorted_i_M_W(mass):
        masked = ma.fix_invalid(mass)
        print  >> g, array(sorted(masked, key=itemgetter(1)))
    
    Sorted_i_M_W(Data_0_30)
    
    g.close()
    

    Or I replaced the function with something like this:

    def Sorted_i_M_W(mass):
        sortedmass = sorted( mass, key=itemgetter(1))
        print  >> g, array(sortedmass)
    

    For each attempt I got something like:

    ...
    [  4.46800000e+03   1.61472200e+11]
    [  3.72700000e+03   1.74166300e+11]
    [  4.91800000e+03   1.75502300e+11]
    [  6.43500000e+03              nan]
    [  3.95520000e+04   8.38907500e+09]
    [  3.63750000e+04   1.27625700e+10]
    [  2.08810000e+04   1.28578500e+10]
    ...
    

    Where at the location of the NaN value, the sorting re-starts again.

    (For the fix_invalid the NaN in the above excerpt shows a 1.00000000e+20 value). But I'd like the sorting to ignore the NaN value completely.

    What's the easiest way to sort this array the way I want?

    • BlackVegetable
      BlackVegetable over 10 years
      Have you tried using a filter() call to remove elements with nan before sorting the remainder of the list?
    • freude
      freude over 10 years
      In the latest version of numpy, the function sort can deal with nans in the way you are seeking. Here is the link docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
    • Bakuriu
      Bakuriu over 10 years
      Why not something like: key=lambda x: x[1] if x[1] == x[1] else float('+inf') to put the NaNs at the end? I believe this wont be that much slower in the end. However if you can, just use numpy functions that will be much faster.
  • hihell
    hihell almost 7 years
    another problem worth noticing is that np.argsort can not sort a object array with np.nan in it. if a array is dtype == object, the np.nan will not be placed correctly (and there is no warning)