Vector operations with numpy

Question

I have three numpy arrays:

X: a 3073 x 49000 matrix
W: a 10 x 3073 matrix
y: a 49000 x 1 vector

y contains values between 0 and 9, each value represents a row in W.

I would like to add the first column of X to the row in W given by the first element in y. I.e. if the first element in y is 3, add the first column of X to the fourth row of W. And then add the second column of X to the row in W given by the second element in y and so on, until all columns of X has been aded to the row in W specified by y, which means a total of 49000 added rows.

W[y] += X.T does not work for me, because this will not add more than one vector to a row in W.

Please note: I'm only looking for vectorized solutions. I.e. no for-loops.

EDIT: To clarify I'll add an example with small matrix sizes adapted from Salvador Dali's example below.

In [1]: import numpy as np

In [2]: a, b, c = 3, 4, 5

In [3]: np.random.seed(0)

In [4]: X = np.random.randint(10, size=(b,c))

In [5]: W = np.random.randint(10, size=(a,b))

In [6]: y = np.random.randint(a, size=(c,1))

In [7]: X
Out[7]: 
array([[5, 0, 3, 3, 7],
       [9, 3, 5, 2, 4],
       [7, 6, 8, 8, 1],
       [6, 7, 7, 8, 1]])

In [8]: W
Out[8]: 
array([[5, 9, 8, 9],
       [4, 3, 0, 3],
       [5, 0, 2, 3]])

In [9]: y
Out[9]: 
array([[0],
       [1],
       [1],
       [2],
       [0]])

In [10]: W[y.ravel()] + X.T
Out[10]: 
array([[10, 18, 15, 15],
       [ 4,  6,  6, 10],
       [ 7,  8,  8, 10],
       [ 8,  2, 10, 11],
       [12, 13,  9, 10]])

In [11]: W[y.ravel()] = W[y.ravel()] + X.T

In [12]: W
Out[12]: 
array([[12, 13,  9, 10],
       [ 7,  8,  8, 10],
       [ 8,  2, 10, 11]])

The problem is to get BOTH column 0 and column 4 in X added to row 0 in W, as well as both column 1 and 2 in X added to row 1 in W.

The desired outcome is thus:

W = [[17, 22, 16, 16],
     [ 7, 11, 14, 17],
     [ 8,  2, 10, 11]]

Looks very similar to [`Vectorize addition into array indexed by another array`](http://stackoverflow.com/questions/32142631/vectorize-addition-into-array-indexed-by-another-array). — Divakar, Aug 29 '15 at 06:15
Is 'no loop' a speed issue or a programming challenge issue? — hpaulj, Aug 29 '15 at 16:45
It's a programming challenge issue, motivated by speed. I.e. as far as I'm concerned it's a programming challenge, but the reason I can't use loops is to practice writing more performant code. — Skeppet, Sep 03 '15 at 06:03

score 3 · Answer 1 · edited May 23 '17 at 12:31

3

Vectorized approaches

Approach #1

Based on this answer, here's a vectorized solution using np.bincount -

N = y.max()+1
id = y.ravel() + np.arange(X.shape[0])[:,None]*N
W[:N] += np.bincount(id.ravel(), weights=X.ravel()).reshape(-1,N).T

Approach #2

You can make good usage of boolean indexing and np.einsum to get the job done in a concise vectorized manner -

N = y.max()+1
W[:N] += np.einsum('ijk,lk->il',(np.arange(N)[:,None,None] == y.ravel()),X)

Loopy approaches

Approach #3

Since you are selecting and adding up a huge number of columns from X per unique y, it might be better in terms of performance to run a loop with complexity equal to the number of such unique y's, which seems to be at max equal to the number of rows in W and that in your case is just 10. Thus, the loop has just 10 iterations, not bad! Here's the implementation to fulfill those aspirations -

for k in range(W.shape[0]):
    W[k] += X[:,(y==k).ravel()].sum(1)

Approach #4

You can bring in np.einsum to do the columnwise summations and have the final output like so -

for k in range(W.shape[0]):
    W[k] += np.einsum('ij->i',X[:,(y==k).ravel()])

edited May 23 '17 at 12:31

Community

1
1

answered Aug 29 '15 at 07:38

Divakar

218,885
19
262
358

Thanks for solution. However, it is still slower than for-loops... Especially noticeable on the matrix sizes mentioned in the original question. – Skeppet Aug 29 '15 at 09:49
@Skeppet Not very surprised by that. Look [`here`](http://stackoverflow.com/a/32144453/3293881) into modified for-loop version, `for_loop_v2` for slightly better one with performance. – Divakar Aug 29 '15 at 09:51
@Skeppet Added in such a loop with low complexity. – Divakar Aug 29 '15 at 10:02
http://stackoverflow.com/a/28205888/901925 notes that if `y` has gaps, the `bincount` solution raises an error. – hpaulj Aug 29 '15 at 19:14
@hpaulj Thanks for your edits and pointing out the error case! Also, my edits should take care of the out of bounds/error cases. – Divakar Aug 30 '15 at 07:01

score 3 · Answer 2 · edited May 23 '17 at 12:23

First the straight forward loop solution as reference:

In [65]: for i,j in enumerate(y):
    W[j]+=X[:,i]
   ....:     

In [66]: W
Out[66]: 
array([[17, 22, 16, 16],
       [ 7, 11, 14, 17],
       [ 8,  2, 10, 11]])

An add.at solution:

In [67]: W=W1.copy()
In [68]: np.add.at(W,(y.ravel()),X.T)
In [69]: W
Out[69]: 
array([[17, 22, 16, 16],
       [ 7, 11, 14, 17],
       [ 8,  2, 10, 11]])

add.at does an unbuffered calculation, getting around the buffering that prevents W[y.ravel()] += X.T from working. It is still iterative, but the loop has been moved to compiled code. It isn't true vectorization because the order of application matters. The addition for one row of X.T depends on the results from the previous rows.

https://stackoverflow.com/a/20811014/901925 is the answer I gave a couple of years ago to a similar question (for 1d arrays).

But when dealing with your large arrays:

X: a 3073 x 49000 matrix
W: a 10 x 3073 matrix
y: a 49000 x 1 vector

this can run into speed issues. Note that W[y.ravel()] is the same size as X.T (why did you pick these sizes that require transpose?). And it's a copy, not a view. So there's already a time penalty.

bincount has been suggested in previous questions, and I think it is faster. Making for loop with index arrays faster (both bincount and add.at solutions)

Iterating over the small 3073 dimension could also have speed advantages. Or better yet on the size 10 dimension as Divakar demonstrates.

For the small test case, a,b,c=3,4,5, the add.at solution is fastest, with Divakar's bincount and einseum next. For a larger a,b,c=10,1000,20000, add.at gets very slow, with bincount being the fastest.

Related SO answers

https://stackoverflow.com/a/28205888/901925 (notes that bincount requires complete coverage for y).

https://stackoverflow.com/a/30041823/901925 (where Divakar again shows that bincount rules!)

https://github.com/numpy/numpy/issues/5922 - `ufunc.at perfomance >10x too slow #5922`. This relative speed of .`at` and `bincount` is something the developers have discussed. It's attributed, in part, to the generality of one versus the specialized purpose of the other. — hpaulj, Aug 31 '15 at 06:49
Wow, that was a very thorough post. Thank you very much! Regarding the matrix sizes, they were not my choice. I'm just doing exercises for a university course I'm following along. — Skeppet, Sep 03 '15 at 06:11
OK, a class. That's makes sense. I've seen at least one earlier SO question with 3073 and 49000. — hpaulj, Sep 03 '15 at 07:11

score 1 · Answer 3 · answered Aug 29 '15 at 06:08

This will achieve what you want: X + W[y.ravel()].T

To see that this really does the work, here is a reproducible example:

import numpy as np
np.random.seed(0)
a, b, c = 3, 5, 4  # you can use your 3073, 49000, 10 later

X = np.random.rand(a, b)
W = np.random.rand(c, a)
y = np.random.randint(c, size=(b, 1))

Now your matrices are:

[[ 0.0871293   0.0202184   0.83261985]
 [ 0.77815675  0.87001215  0.97861834]
 [ 0.79915856  0.46147936  0.78052918]
 [ 0.11827443  0.63992102  0.14335329]]

[[3]
 [0]
 [3]
 [2]
 [0]]

[[ 0.5488135   0.71518937  0.60276338  0.54488318  0.4236548 ]
 [ 0.64589411  0.43758721  0.891773    0.96366276  0.38344152]
 [ 0.79172504  0.52889492  0.56804456  0.92559664  0.07103606]]

And W[y.ravel()] gives you " W given by the first element in y". By transposing it, you will get a matrix ready to be added to X:

[[ 0.11827443  0.0871293   0.11827443  0.79915856  0.0871293 ]
 [ 0.63992102  0.0202184   0.63992102  0.46147936  0.0202184 ]
 [ 0.14335329  0.83261985  0.14335329  0.78052918  0.83261985]]

Thank you for a clear answer! I'm sorry my explanation wasn't clear enough. Edited the question with an example. However, the problem of adding more than one column from `X` still remains with your solution. — Skeppet, Aug 29 '15 at 07:11
No, as I said, the problem remains with your suggested solution. If the same number appears more than once in `y` (as it is sure to do for 49000 rows), your suggested solution will only add the column from `X` indicated by the last occurrence of that number. I'll update the question again for you to see the difference. — Skeppet, Aug 29 '15 at 07:26
As you can see it works for row two in `W`, since that row only occurs once in `y`, but for row 0 and 1 it doesn't. — Skeppet, Aug 29 '15 at 07:29

score 0 · Answer 4 · answered Aug 29 '15 at 06:10

0

While I can't say that this is very pythonic, it is a solution (I think):

for column in range(x.shape[1]):
    w[y[column]] = x[:,column].T

answered Aug 29 '15 at 06:10

rhozzy

332
1
9

Sorry, no for loops. Need a vectorized solution. Thanks for the effort though! – Skeppet Aug 29 '15 at 07:15

Vector operations with numpy

4 Answers4

Vectorized approaches

Loopy approaches

Linked