Python: sorting an array with NaNs
Solution 1
Not sure if it can be done with numpy.sort
, but you can use numpy.argsort
for sure:
>>> arr
array([[ 105., 4.],
[ 53., 520.],
[ 745., 902.],
[ 19., nan],
[ 184., nan],
[ 22., 10.],
[ 104., 26.]])
>>> arr[np.argsort(arr[:,1])]
array([[ 105., 4.],
[ 22., 10.],
[ 104., 26.],
[ 53., 520.],
[ 745., 902.],
[ 19., nan],
[ 184., nan]])
Solution 2
You can create a masked array:
a = np.loadtxt('test.txt')
mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)
And then sort a
using the masked array:
a[np.argsort(ma[:, 1])]
Solution 3
If you're using an older version of numpy and don't want to upgrade (or if you want code that supports older versions of numpy) you can do:
import numpy as np
def nan_argsort(a):
temp = a.copy()
temp[np.isnan(a)] = np.inf
return temp.argsort()
sorted = a[nan_argsort(a[:, 1])]
In newer versions of numpy, at least 1.6 I think, numpy's sort/argsort already has this behavior. If you need to use python's sort for some reason, you can make your own compare function as described in the other answers.
Solution 4
You can use comparision function
def cmpnan(x, y):
if isnan(x[1]):
return 1 # x is "larger"
elif isnan(y[1]):
return -1 # x is "smaller"
else:
cmp(x[1], y[1]) # compare numbers
sorted(data, cmp=cmpnan)
see http://docs.python.org/2.7/library/functions.html#sorted
Related videos on Youtube
user3207120
Updated on September 14, 2022Comments
-
user3207120 over 1 year
Note: I'm using Python and numpy arrays.
I have many arrays which all have two columns and many rows. There are some NaN values in the second column; the first column only has numbers.
I would like to sort each array in increasing order according to the second column, leaving the NaN values out. It's a big dataset so I would rather not have to convert the NaN values into zeros or something.
I'd like it to sort like so:
105. 4. 22. 10. 104. 26. ... ... ... 53. 520. 745. 902. 184. nan 19. nan
First I tried using
fix_invalid
which converts the NaNs into1x10^20
:#data.txt has one of the arrays with 2 columns and a bunch of rows. Data_0_30 = array(genfromtxt(fname='data.txt')) g = open("iblah.txt", "a") #saves to file def Sorted_i_M_W(mass): masked = ma.fix_invalid(mass) print >> g, array(sorted(masked, key=itemgetter(1))) Sorted_i_M_W(Data_0_30) g.close()
Or I replaced the function with something like this:
def Sorted_i_M_W(mass): sortedmass = sorted( mass, key=itemgetter(1)) print >> g, array(sortedmass)
For each attempt I got something like:
... [ 4.46800000e+03 1.61472200e+11] [ 3.72700000e+03 1.74166300e+11] [ 4.91800000e+03 1.75502300e+11] [ 6.43500000e+03 nan] [ 3.95520000e+04 8.38907500e+09] [ 3.63750000e+04 1.27625700e+10] [ 2.08810000e+04 1.28578500e+10] ...
Where at the location of the NaN value, the sorting re-starts again.
(For the
fix_invalid
the NaN in the above excerpt shows a1.00000000e+20
value). But I'd like the sorting to ignore the NaN value completely.What's the easiest way to sort this array the way I want?
-
BlackVegetable over 10 yearsHave you tried using a
filter()
call to remove elements withnan
before sorting the remainder of the list? -
freude over 10 yearsIn the latest version of numpy, the function sort can deal with nans in the way you are seeking. Here is the link docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
-
Bakuriu over 10 yearsWhy not something like:
key=lambda x: x[1] if x[1] == x[1] else float('+inf')
to put the NaNs at the end? I believe this wont be that much slower in the end. However if you can, just usenumpy
functions that will be much faster.
-
-
hihell almost 7 yearsanother problem worth noticing is that np.argsort can not sort a object array with np.nan in it. if a array is dtype == object, the np.nan will not be placed correctly (and there is no warning)