Is there a way to get the index of the median in python in one command?
Solution 1
a quick approximation:
numpy.argsort(data)[len(data)//2]
Solution 2
It seems old question, but i found a nice way to make it so:
import random
import numpy as np
#some random list with 20 elements
a = [random.random() for i in range(20)]
#find the median index of a
medIdx = a.index(np.percentile(a,50,interpolation='nearest'))
The neat trick here is the percentile builtin option for nearest interpolation, which return a "real" median value from the list, so it is safe to search for it afterwards.
Solution 3
In general, this is an ill-posed question because an array does not necessarily contain its own median for numpy's definition of the median. For example:
>>> np.median([1, 2])
1.5
But when the length of the array is odd, the median will generally be in the array, so asking for its index does make sense:
>>> np.median([1, 2, 3])
2
For odd-length arrays, an efficient way to determine the index of the median value is by using the np.argpartition
function. For example:
import numpy as np
def argmedian(x):
return np.argpartition(x, len(x) // 2)[len(x) // 2]
# Works for odd-length arrays, where the median is in the array:
x = np.random.rand(101)
print("median in array:", np.median(x) in x)
# median in array: True
print(x[argmedian(x)], np.median(x))
# 0.5819150016674371 0.5819150016674371
# Doesn't work for even-length arrays, where the median is not in the array:
x = np.random.rand(100)
print("median in array:", np.median(x) in x)
# median in array: False
print(x[argmedian(x)], np.median(x))
# 0.6116799104572843 0.6047559243909065
This is quite a bit faster than the accepted sort-based solution as the size of the array grows:
x = np.random.rand(1000)
%timeit np.argsort(x)[len(x)//2]
# 10000 loops, best of 3: 25.4 µs per loop
%timeit np.argpartition(x, len(x) // 2)[len(x) // 2]
# 100000 loops, best of 3: 6.03 µs per loop
Solution 4
You can keep the indices with the elements (zip
) and sort and return the element on the middle or two elements on the middle, however sorting will be O(n.logn)
. The following method is O(n)
in terms of time complexity.
import numpy as np
def arg_median(a):
if len(a) % 2 == 1:
return np.where(a == np.median(a))[0][0]
else:
l,r = len(a) // 2 - 1, len(a) // 2
left = np.partition(a, l)[l]
right = np.partition(a, r)[r]
return [np.where(a == left)[0][0], np.where(a == right)[0][0]]
print(arg_median(np.array([ 3, 9, 5, 1, 15])))
# 1 3 5 9 15, median=5, index=2
print(arg_median(np.array([ 3, 9, 5, 1, 15, 12])))
# 1 3 5 9 12 15, median=5,9, index=2,1
Output:
2
[2, 1]
The idea is if there is only one median (array has a odd length), then it returns the index of the median. If we need to average to elements (array has even length) then it returns the indices of these two elements in an list.
Solution 5
The accepted answer numpy.argsort(data)[len(data)//2]
can not handle arrays with NaNs.
For 2-D array, to get the median column index in the axis=1 (along row):
df = pd.DataFrame({'a': [1, 2, 3.3, 4],
'b': [80, 23, np.nan, 88],
'c': [75, 45, 76, 67],
'd': [5, 4, 6, 7]})
data = df.to_numpy()
# data
array([[ 1. , 80. , 75. , 5. ],
[ 2. , 23. , 45. , 4. ],
[ 3.3, nan, 76. , 6. ],
[ 4. , 88. , 67. , 7. ]])
# median, ignoring NaNs
amedian = np.nanmedian(data, axis=1)
aabs = np.abs(data.T-amedian).T
idx = np.nanargmin(aabs, axis=1)
idx
array([2, 1, 3, 2])
# the accepted answer, please note the third index is 2, the correspnoding cell value is 76, which should not be the median value in row [ 3.3, nan, 76. , 6. ]
idx = np.argsort(data)[:, len(data[0])//2]
idx
array([2, 1, 2, 2])
Since this is a 4*4 array with even columns, the column index of median value for row No.3 should be 6 instead of 76.
Related videos on Youtube
Itay Lieder
Updated on March 09, 2021Comments
-
Itay Lieder about 3 years
Is there something like
numpy.argmin(x)
, but for median?-
Itay Lieder over 8 yearsThe title should have been "in Python" instead of "in numpy". I couldn't find it using google.
-
Itay Lieder over 8 yearsI can do np.argmin(np.abs(np.median(x) - x)), but was wondering if there is already a command.
-
MSeifert over 8 yearsAs far as I know there is no single command that does this. Am I right you want something like
np.argmedian(array)
without any nested calculations? -
Itay Lieder over 8 yearsYes, I thought something might have existed. Guess not.
-
Warren Weckesser over 8 years
np.median([1, 2])
returns 1.5. How shouldnp.argmedian([1, 2])
be defined in this case? -
Lukas almost 8 yearsI wonder why the question gets downvoted. Writing an
np.argmin
which works with anyaxis=...
and anykeepdims=...
is not trivial. -
Moot over 6 years@WarrenWeckesser: I think this be dealt with using a flag that would set whether you choose the lower or the upper of the two options for that case (with a default value). It seems like a natural addition to numpy, IMHO.
-
-
leermeester over 6 yearsthe accepted answer
numpy.argsort(data)[len(data)//2]
is 3-4 times faster, but this one is still elegant :) -
Ray almost 6 yearsIn general this has complexity n log(n) because it involves sorting. Finding the median only takes linear when you use Quickselect. Theoretically it would have lower complexity to compute the median first and then search for it, like @Hagay's answer.
-
gizzmole over 4 yearsfor
numpy.array
usenp.argwhere(a == np.percentile(a, 50, interpolation='nearest'))
-
Mad Physicist over 3 yearsThe basis of
percentile
ispartition
, which does this out of the box -
Mad Physicist over 3 yearsThis is by far the best answer