A dataset contains numerical and categorial variables, and I split then into two parts:
cont_data = data[cont_variables].values
disc_data = data[disc_variables].values
Then I use sklearn.preprocessing.OneHotEncoder
to encode the categorical data, and then I tried to merge the coded categorical data with the numerical data:
np.concatenate((cont_data, disc_data_coded), axis=1)
But the following error occurs:
ValueError: all the input arrays must have same number of dimensions
I ensured that the number of dimensions are equal:
print(cont_data.shape) # (24000, 35)
print(disc_data_coded.shape) # (24000, 26)
Finally, I found that cont_data
is a numpy array
while
>>> disc_data_coded
<24000x26 sparse matrix of type '<class 'numpy.float64'>'
with 312000 stored elements in Compressed Sparse Row format>
I changed the parameter sparse
in OneHotEncoder
to be False
, everything is OK.
But the question is, how can I merge a numpy array
with a sparse matrix
directly, without setting sparse=False
?