Python Matrix sorting via one column

20,811

Solution 1

You could use np.lexsort:

numpy.lexsort(keys, axis=-1)

Perform an indirect sort using a sequence of keys.

Given multiple sorting keys, which can be interpreted as columns in a spreadsheet, lexsort returns an array of integer indices that describes the sort order by multiple columns.


In [13]: data = np.matrix(np.arange(10)[::-1].reshape(-1,2))

In [14]: data
Out[14]: 
matrix([[9, 8],
        [7, 6],
        [5, 4],
        [3, 2],
        [1, 0]])

In [15]: temp = data.view(np.ndarray)

In [16]: np.lexsort((temp[:, 1], ))
Out[16]: array([4, 3, 2, 1, 0])

In [17]: temp[np.lexsort((temp[:, 1], ))]
Out[17]: 
array([[1, 0],
       [3, 2],
       [5, 4],
       [7, 6],
       [9, 8]])

Note if you pass more than one key to np.lexsort, the last key is the primary key. The next to last key is the second key, and so on.


Using np.lexsort as I show above requires the use of a temporary array because np.lexsort does not work on numpy matrices. Since temp = data.view(np.ndarray) creates a view, rather than a copy of data, it does not require much extra memory. However,

temp[np.lexsort((temp[:, 1], ))]

is a new array, which does require more memory.

There is also a way to sort by columns in-place. The idea is to view the array as a structured array with two columns. Unlike plain ndarrays, structured arrays have a sort method which allows you to specify columns as keys:

In [65]: data.dtype
Out[65]: dtype('int32')

In [66]: temp2 = data.ravel().view('int32, int32')

In [67]: temp2.sort(order = ['f1', 'f0'])

Notice that since temp2 is a view of data, it does not require allocating new memory and copying the array. Also, sorting temp2 modifies data at the same time:

In [69]: data
Out[69]: 
matrix([[1, 0],
        [3, 2],
        [5, 4],
        [7, 6],
        [9, 8]])

Solution 2

You had the right idea, just off by a few characters:

>>> import numpy as np
>>> data = np.matrix([[9, 8],
...                   [7, 6],
...                   [5, 4],
...                   [3, 2],
...                   [1, 0]])
>>> data[np.argsort(data.A[:, 1])]
matrix([[1, 0],
        [3, 2],
        [5, 4],
        [7, 6],
        [9, 8]])
Share:
20,811
tripkane
Author by

tripkane

Updated on November 13, 2020

Comments

  • tripkane
    tripkane over 3 years

    I have a n x 2 matrix of integers. The first column is a series 0,1,-1,2,-2, however these are in the order that they were compiled in from their constituent matrices. The second column is a list of indices from another list.

    I would like to sort the matrix via this second column. This would be equivalent to selecting two columns of data in Excel, and sorting via Column B (where the data is in columns A and B). Keep in mind, the adjacent data in the first column of each row should be kept with its respective second column counterpart. I have looked at solutions using the following:

    data[np.argsort(data[:, 0])]
    

    But this does not seem to work. The matrix in question looks like this:

    matrix([[1, 1],
            [1, 3],
            [1, 7],
            ..., 
            [2, 1021],
            [2, 1040],
            [2, 1052]])