`np.concatenate` a numpy array with a sparse matrix

Question

A dataset contains numerical and categorial variables, and I split then into two parts:

cont_data = data[cont_variables].values
disc_data = data[disc_variables].values

Then I use sklearn.preprocessing.OneHotEncoder to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:

np.concatenate((cont_data, disc_data_coded), axis=1)

But the following error occurs:

ValueError: all the input arrays must have same number of dimensions

I ensured that the number of dimensions are equal:

print(cont_data.shape)        # (24000, 35)
print(disc_data_coded.shape)  # (24000, 26)

Finally, I found that cont_data is a numpy array while

>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>

I changed the parameter sparse in OneHotEncoderto be False, everything is OK. But the question is, how can I merge a numpy array with a sparse matrix directly, without setting sparse=False?

score 12 · Accepted Answer · answered Mar 22 '18 at 04:28

Sparse matrices are not subclasses of numpy arrays; so numpy methods often don't work. Use sparse functions instead, such as sparse.vstack and sparse.hstack. But all inputs then have to be sparse.

Or make the sparse matrix dense first, with .toarray(), and use np.concatenate.

Do you want the result to sparse or dense?

In [32]: sparse.vstack((sparse.csr_matrix(np.arange(10)),sparse.csr_matrix(np.on
    ...: es((3,10)))))
Out[32]: 
<4x10 sparse matrix of type '<class 'numpy.float64'>'
    with 39 stored elements in Compressed Sparse Row format>
In [33]: np.concatenate((sparse.csr_matrix(np.arange(10)).A,np.ones((3,10))))
Out[33]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

just for supplementary, here `sparse` is a sub-module from `scipy`. To run the code above, just add: `from scipy import sparse`. — htredleaf, Oct 13 '22 at 01:58

`np.concatenate` a numpy array with a sparse matrix

1 Answers1

Linked