Numpy argsort - what is it doing?

103,400

Solution 1

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

Solution 2

[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

There are a number of ways to get the result you are looking for:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

For example,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

This checks that they all produce the same result:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

For small arrays, using_argsort_twice may be faster:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

Note also that stats.rankdata gives you more control over how to handle elements of equal value.

Solution 3

As the documentation says, argsort:

Returns the indices that would sort an array.

That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.

Solution 4

numpy.argsort(a, axis=-1, kind='quicksort', order=None)

Returns the indices that would sort an array

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as that index data along the given axis in sorted order.

Consider one example in python, having a list of values as

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

Now we use argsort function:

import numpy as np
list(np.argsort(listExample))

The output will be

[0, 5, 6, 1, 3, 2, 4]

This is the list of indices of values in listExample if you map these indices to the respective values then we will get the result as follows:

[0, 0, 1, 2, 2000, 2456, 5000]

(I find this function very useful in many places e.g. If you want to sort the list/array but don't want to use list.sort() function (i.e. without changing the order of actual values in the list) you can use this function.)

For more details refer this link: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argsort.html

Solution 5

input:
import numpy as np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort().argsort()

output:
array([3, 2, 0, 1])

Share:
103,400

Related videos on Youtube

user1276273
Author by

user1276273

Updated on November 05, 2021

Comments

  • user1276273
    user1276273 over 2 years

    Why is numpy giving this result:

    x = numpy.array([1.48,1.41,0.0,0.1])
    print x.argsort()
    
    >[2 3 1 0]
    

    when I'd expect it to do this:

    [3 2 0 1]

    Clearly my understanding of the function is lacking.

    • zwol
      zwol over 10 years
      Why did you think [3 2 0 1] would have been the correct answer?
    • user1276273
      user1276273 over 10 years
      I just had an inverted understanding of the output. Ie, if you take the first element of x, it should be in position 3 of a sorted array and so on.
    • adrienlucca.net
      adrienlucca.net over 7 years
      your way of thinking totally makes sense, I had exactly the same question
    • Lahiru Karunaratne
      Lahiru Karunaratne about 6 years
      [3 2 0 1] - this is ranking the values, you are not getting the actual indices.
    • lincr
      lincr almost 4 years
      Just to remember that the output indicates locations in the original array while you think it in the sorted array. That means output[0] is the index where the smallest element in the original input array locates and output[-1] for the biggest element.
    • SmallChess
      SmallChess over 2 years
      You were trying to rank it not sort it.
  • Phani
    Phani over 9 years
    Can you add some explanation on why applying argsort() twice gives us the rank?
  • unutbu
    unutbu over 9 years
    @Phani: argsort returns the indices of the sorted array. The index of the sorted indices is the rank. This is what the second call to argsort returns.
  • Alex C
    Alex C almost 8 years
    The first argsort returns a permutation (which if applied to the data would sort it). When argsort is applied to (this or any) permutation, it returns the inverse permutation (that if the 2 permutations are applied to each other in either order the result is the Identity). The second permutation if applied to a sorted data array would produce the unsorted data array, i.e. it is the rank.
  • Belter
    Belter about 7 years
    a = x.argsort(), print x[a], we will get array([ 0. , 0.1 , 1.41, 1.48])
  • peacetype
    peacetype about 6 years
    While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
  • Jose A
    Jose A almost 6 years
    Mind blown. I finally understood it! It returns an array whose content is the indices of the original array in a sorted order.
  • Nathan
    Nathan about 5 years
    x[x.argsort()] is not necessarily the same as np.sort(x). In fact, it's not necessarily even the same shape. Try this with a 2D array. This only happens to work with 1D arrays.
  • Multihunter
    Multihunter about 5 years
    I feel like that's unnecessarily pedantic. The question is about 1D arrays. This is intended as a way to understand what the difference was, rather than literal code to use. Additionally, when you have a 2D array it's not even clear what kind of sorting you want. Do you want a global sort? If not, which axis should be sorted? Regardless, I've added a disclaimer.