7

I would like to sort numpy 2D arrays according to a previously processed reference array. My idea was to store the numpy.argsort output of my reference array and using it to sort the other ones:

In [13]: # my reference array
    ...: ref_arr = np.random.randint(10, 30, 12).reshape(3, 4)
Out[14]:
array([[12, 22, 12, 13],
       [28, 26, 21, 23],
       [24, 14, 16, 25]])

# desired output:
array([[12, 14, 12, 13],
       [24, 22, 16, 23],
       [28, 26, 21, 25]])

What I tried:

In [15]: # store the sorting matrix
    ...: sm = np.argsort(ref_arr, axis=0)
Out[16]:
array([[0, 2, 0, 0],
       [2, 0, 2, 1],
       [1, 1, 1, 2]])

But unfortunately the final step does only work with one dimensional arrays:

In [17]: ref_arr[sm]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-17-48b785178465> in <module>()
----> 1 ref_arr[sm]

IndexError: index 3 is out of bounds for axis 0 with size 3

I found this Github issue that was created in regard to this problem but, unfortunately, it was solved by mentioning that what I tried works for 1D arrays only.

In a comment to this issue an example is mentioned that is similar to my problem. The snippet does not solve my problem as it sorts the array by row and not by column. But it gives a hint in which direction I have to move...

a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]

Unfortunately I don't understand the example enough to adapt it for my use case. Maybe someone can explain how this advanced indexing works here? That might enable me to solve the problem on my own but ofc I wouldn't mind a turnkey solution as well. ;)

Thank you.

Just in case: I am using Python 3.6.1 and numpy 1.12.1 on OS X.

wedi
  • 1,332
  • 1
  • 13
  • 28
  • So, `a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]` works for you, but you need an explanation on how it works? – Divakar Apr 04 '17 at 22:17
  • No, it does not work for me as it sorts the array by row. I would like to sort by column. But I hope to be able to find the correct to solution when I understand how that example works. – wedi Apr 04 '17 at 22:21
  • 1
    Your `argsort` call isn't sorting along the axis you say you want. – user2357112 Apr 04 '17 at 22:26
  • @Divakar: I clarified the question. – wedi Apr 04 '17 at 22:31
  • @user2357112: With the argsort alone one cannot sort a multi-dimensional array. – wedi Apr 04 '17 at 22:31
  • 1
    @wedi: No, I mean your `argsort` is sorting along the rows when you say you want the other direction. – user2357112 Apr 04 '17 at 22:33
  • It would be nice to comment when downvoting, so I have a chance to improve my question... – wedi Apr 04 '17 at 22:36
  • @user2357112: True! I copied a wrong line! Thanks. – wedi Apr 04 '17 at 22:42
  • 1
    http://stackoverflow.com/a/33141247/901925 attempts to illustrate and explain the `a[np.arange(np.shape...np.argsort(a)]` – hpaulj Apr 04 '17 at 23:35
  • 1
    There's a [numpy issue](https://github.com/numpy/numpy/issues/8708) about making this kind of thing easier – Eric Apr 04 '17 at 23:41

2 Answers2

7

As of may 2018 it can done using np.take_along_axis

np.take_along_axis(ref_arr, sm, axis=0)
Out[25]: 
array([[10, 16, 15, 10],
       [13, 23, 24, 12],
       [28, 26, 28, 28]])
vozman
  • 1,198
  • 1
  • 14
  • 19
3

Basically two steps are needed :

1] Get the argsort indices along each col with axis=0 -

sidx = ref_arr.argsort(axis=0)

2] Use advanced-indexing to use sidx for selecting rows i.e. to index into the first dimension and use another range array to index into the second dimension, so that it would cover sidx indices across all the columns -

out = ref_arr[sidx, np.arange(sidx.shape[1])]

Sample run -

In [185]: ref_arr
Out[185]: 
array([[12, 22, 12, 13],
       [28, 26, 21, 23],
       [24, 14, 16, 25]])

In [186]: sidx = ref_arr.argsort(axis=0)

In [187]: sidx
Out[187]: 
array([[0, 2, 0, 0],
       [2, 0, 2, 1],
       [1, 1, 1, 2]])

In [188]: ref_arr[sidx, np.arange(sidx.shape[1])]
Out[188]: 
array([[12, 14, 12, 13],
       [24, 22, 16, 23],
       [28, 26, 21, 25]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thank you! I got my solution and looking at it I understood the other example as well. :) – wedi Apr 04 '17 at 22:40