Boolean indexing on multidimensionnal array

Question

I'm very new in Python and in Numpy. In fact, I'm just learning.

I'm reading this tutorial, and I got stuck in these lines :

>>> x = np.arange(30).reshape(2,3,5)
>>> x
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],
       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])
>>> b = np.array([[True, True, False], [False, True, True]])
>>> x[b]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

I can't understand how we have come up with the result of x[b].

I also try to guess the result of x[[False, False, False, True]]

Please explain to me, I'm a very newbie.

Perhaps you should spend some more time learning core Python before diving into Numpy.... But anyway, `b.shape` is (2, 3), so each `True` item in `b` selects one row from `a`. — PM 2Ring, Oct 20 '17 at 08:58
Simplest way to imagine would be to consider the data array being reshaped as 2D array : (6,5) shape and then use flattened mask array to select rows off it. — Divakar, Oct 20 '17 at 09:06
@PM2Ring please could you illustrate a bit more. I'm lost :( — kabrice, Oct 20 '17 at 09:13
@cᴏʟᴅsᴘᴇᴇᴅ Think we need a better dup target that discusses boolean arrays for indexing. — Divakar, Oct 20 '17 at 09:21
Lol'd at the title `Boolean indexing on multidimensionnal array [NOT DUPLICATE] .......[duplicate]` — cs95, Oct 20 '17 at 09:21
@Divakar Second one no good? The only other alternative would be re-pasting the numpy docs, unless you want to write an answer for this _overly_ broad question. You can reopen if you feel so. — cs95, Oct 20 '17 at 09:22
@cᴏʟᴅsᴘᴇᴇᴅ Couldn't find a closer one. Reopening. — Divakar, Oct 20 '17 at 09:50
`x` is 3d. `b` is selecting on the first 2 dimensions..look at `np.where(b)`. Try `x[[0,0,1,1],[0,1,1,2],:]` (I think). It would be nice if the duplicates illustrate boolean masking of 3d with a 2d. It's not a trivial example of advanced indexing.. — hpaulj, Oct 20 '17 at 09:51

score 3 · Answer 1 · edited Mar 28 '22 at 03:24

Under the hood, it computes the subscripted indices (indices along each dimensions) for the dimensions covered by the mask starting from the dimension it maps from, while selecting all elements from the un-indexed axes.

Case 1: `3D` data and `2D` mask

For example with b of two dimensions, it maps onto two dimensions and hence with x[b], it maps starting from the first axis onward.

The subscripted indices are computed with np.where/np.nonzero:

r,c = np.nonzero(b)

Thus, x[b] translates to x[r, c, :], or simply x[r, c]. So, then it uses advanced-indexing to select elements off each axis axis from the pairs of indexing tuples formed off r and c.

Case 2: `4D` data and `2D` mask

Now, let's increase the dimensionality of data array to 4D, keeping the same 2D mask, but index starting from the second axis onward, i.e. x[:, b].

Let's say we have

x = np.arange(60).reshape(2,2,3,5)

Get the subscripted indices and then use advanced-indexing:

r,c = np.nonzero(b)

So, x[:, b] should be same as x[:, r, c]:

In [148]: x = np.arange(60).reshape(2, 2, 3, 5)

In [149]: b = np.array([[True, True, False], [False, True, True]])

In [150]: r,c = np.nonzero(b)

In [151]: np.allclose(x[:, b], x[:, r, c])
Out[151]: True

Case 3: `4D` data and `3D` mask

To go deeper, let's consider a 3D mask array with 4D data array and use all of the theory proposed earlier for verification:

In [144]: x = np.arange(60).reshape(2, 2, 3, 5)
     ...: b = np.random.rand(2, 3, 5) > 0.5

In [146]: r, c, p = np.nonzero(b)

In [147]: np.allclose(x[:, b], x[:, r, c, p])
Out[147]: True

As for the edit, x[[False, False, False, True]], you are indexing only along the first axis with a boolean array of length 5, whereas the first axis of x has a length smaller than that, hence reports an error on indexing.

Downvoted because I think this answer is way too advanced for a 'newbie'. — Stanko, Oct 20 '17 at 10:13
@Stanko Well you can't answer it in its entirety without giving out the details. Lame excuse really for a downvote. — Divakar, Oct 20 '17 at 10:15

score 2 · Accepted Answer · answered Oct 20 '17 at 10:00

2

You have 3 arrays in 1 array:

[
 [ 0,  1,  2,  3,  4],
 [ 5,  6,  7,  8,  9],
 [10, 11, 12, 13, 14]
]

With your following line: b = np.array([[True, True, False], ...]) you say that you want to keep the first 2 rows (the first 2 True values) and that you don't want the last row (the last False value).

The other part works the same way, you have 3 arrays in 1 array:

[
 [15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24],
 [25, 26, 27, 28, 29]
]

And your line b = np.array([..., [False, True, True]]) says to not keep the first row (because first value is False) but that you want to keep the two last lines (2 last values are True).

answered Oct 20 '17 at 10:00

Stanko

4,275
3
23
51

You haven't discussed how the dimensions of `mask` maps onto different axes of the data array or for greater dims of data. For example, if I have : `x[:,b]`, how would you do it and still keep the explanation generic. At some point, you would need to start using better terms than just row or "two parts", if you are to actually answer it. Hence, downvoted (good enough reason I think). – Divakar Oct 20 '17 at 10:41
A good reason indeed, I didn't explain in detail but apparently the poster appreciates an easier explanation. Maybe now the poster has a better understanding of his issue thanks to my answer and now he will be more prepared for details like your answer. – Stanko Oct 20 '17 at 13:48

Boolean indexing on multidimensionnal array

2 Answers2

Case 1: 3D data and 2D mask

Case 2: 4D data and 2D mask

Case 3: 4D data and 3D mask

Case 1: `3D` data and `2D` mask

Case 2: `4D` data and `2D` mask

Case 3: `4D` data and `3D` mask