0

Let's suppose that A is a (scipy) sparse matrix with tf-idf values and B is a (numpy) array with some additional features of my data.

Each of the rows of A and B correspond to the same observation.

I want to concatenate these matrices/arrays because then I want to pass them to a sklearn ML model to train it and I do not think that I can pass them separately.

According, to this answer (https://stackoverflow.com/a/49420566/9024698) there are two ways to concatenate these arrays:

  1. Convert the sparse array (A) to a dense array and then concatenate
  2. Convert the fully dense array (B) to a sparse matrix

However, (1) in my case is basically impossible because A in my case is too big.

Therefore, I can think of converting my fully dense array (B) to a sparse array.

However, my question is do I lose any information by doing this (i.e. by converting a fully dense array to a sparse one)?

This post (How to combine TFIDF features with other features) is related to my post but it does not explicitly give an answer to my question.

Outcast
  • 4,967
  • 5
  • 44
  • 99
  • Nope, sparse storage is not lossy. You can verify that yourself by creating a sparse matrix from your dense array, converting back (using `.A` or `.todense()` attribute) and comparing to the original array. – Paul Panzer Aug 05 '19 at 17:17
  • @PaulPanzer, ok so you mean that in the case of Adense -> Asparse -> Adense_again then Adense and Adense_again are absolutely the same? – Outcast Aug 05 '19 at 17:19
  • Yes, exactly. You can even directly compare `Adense==Asparse` and you will get a (dense) array filled with `True`s. – Paul Panzer Aug 05 '19 at 17:25
  • @PaulPanzer, ok sounds pretty good, thank you. Although I am not sure if this sparse representation makes any (considerable) difference to my ML model. – Outcast Aug 05 '19 at 17:30

1 Answers1

0

No you don't lose any information. Sparse/Dense are two different representation of the same data in this case. See https://machinelearningmastery.com/sparse-matrices-for-machine-learning/ for more details

cookiemonster
  • 1,315
  • 12
  • 19