Let's suppose that A is a (scipy
) sparse matrix with tf-idf values and B is a (numpy
) array with some additional features of my data.
Each of the rows of A
and B
correspond to the same observation.
I want to concatenate these matrices/arrays because then I want to pass them to a sklearn ML model to train it and I do not think that I can pass them separately.
According, to this answer (https://stackoverflow.com/a/49420566/9024698) there are two ways to concatenate these arrays:
- Convert the sparse array (
A
) to a dense array and then concatenate - Convert the fully dense array (
B
) to a sparse matrix
However, (1) in my case is basically impossible because A
in my case is too big.
Therefore, I can think of converting my fully dense array (B
) to a sparse array.
However, my question is do I lose any information by doing this (i.e. by converting a fully dense array to a sparse one)?
This post (How to combine TFIDF features with other features) is related to my post but it does not explicitly give an answer to my question.