15

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.

>>> a = np.array([[3,1,2],[8,9,2]])

And I'd like indices i such that,

>>> a[i]
array([[1, 2, 3],
       [2, 8, 9]])

Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:

>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2

Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that

>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
       [9,3,1]])
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119

4 Answers4

14

Solution:

>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
       [2, 8, 9]])

You got it right, though I wouldn't describe it as cheating the indexing.

Maybe this will help make it clearer:

In [544]: i=np.argsort(a,axis=1)

In [545]: i
Out[545]: 
array([[1, 2, 0],
       [2, 0, 1]])

i is the order that we want, for each row. That is:

In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])

In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])

To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.

In [548]: a[[[0],[1]],i]
Out[548]: 
array([[1, 2, 3],
       [2, 8, 9]])

Another array that could be paired with i is:

In [560]: j=np.array([[0,0,0],[1,1,1]])

In [561]: j
Out[561]: 
array([[0, 0, 0],
       [1, 1, 1]])

In [562]: a[j,i]
Out[562]: 
array([[1, 2, 3],
       [2, 8, 9]])

If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.

I think of

np.array([[0],
          [1]])

as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.

The full mapping from a to the new array is:

[a[0,1]  a[0,2]  a[0,0]
 a[1,2]  a[1,0]  a[1,1]]

def foo(a):
    i = np.argsort(a, axis=1)
    return (np.arange(a.shape[0])[:,None], i)

In [61]: foo(a)
Out[61]: 
(array([[0],
        [1]]), array([[1, 2, 0],
        [2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]: 
array([[1, 2, 3],
       [2, 8, 9]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks @hpaulj, really helpful explanations! If you have a sec, could you explain the 'column index[ing] for the 1st dimension'? That's just converting the array to a (2,1,3) right... why does that facilitate the `i` slicing? – DilithiumMatrix Oct 15 '15 at 13:58
  • 2
    I expanded my explanation. – hpaulj Oct 15 '15 at 15:15
  • is there a simpler way of doing this? i thought the argsort should have considered what it's going to be used for after sorting the arrays?..... – MartianMartian Apr 04 '17 at 16:55
  • Other than hiding details in a function I don't know what kind of simplification you are hoping for. `argsort` can be used in a variety of ways, not just this context. Think of it as a building block, which you combine with a few other blocks to do the job. – hpaulj Apr 04 '17 at 19:30
  • http://stackoverflow.com/questions/43220729/access-elements-from-matrix-in-numpy is another example of using an index array along with an `arange` array to select elements from rows. Same logic, only the index array wasn't generated by `argsort`. – hpaulj Apr 05 '17 at 03:50
6

The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:

>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
       [2 8 9]])
powerss
  • 108
  • 1
  • 4
4

I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...

>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
       [2, 8, 9]])
DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • oops I guess `np.sort(dists, axis=1)` is what I was looking for – endolith Dec 18 '16 at 17:13
  • 2
    @endolith totally. For my case I specifically needed the indices to sort another array in the same order. But I agree that the `argsort` documentation could use some more improvement too ;) – DilithiumMatrix Dec 18 '16 at 17:32
  • actually I found a way to do it without any sorting at all, just using amin and amax along axes – endolith Dec 18 '16 at 19:39
0

You can also use linear indexing, which might be better with performance, like so -

M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]

So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.

Sample run -

In [23]: a = np.array([[3,1,2],[8,9,2]])

In [24]: b = np.array([[0,5,4],[3,9,1]])

In [25]: M,N = a.shape

In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]: 
array([[5, 4, 0],
       [1, 3, 9]])

Rumtime tests -

In [27]: a = np.random.rand(1000,1000)

In [28]: b = np.random.rand(1000,1000)

In [29]: M,N = a.shape

In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop

In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358