2

I have an R data frame which columns are logical variables. I need to make some kind of dot product between all possible pairs of columns.

This arise from text corpus analysis, where the data frame indicates which terms (rows) are present in which documents (columns). There are common, fast solutions for the case where one wishes to compute distances with each possible possible pairs of columns, using daisy from the cluster package or cosine from the lsa package.

I would however need to use some kind of dot product between all pairs of columns instead : the goal is to count how many words are simultaneously present in both documents been compared (and this, for each pair).

www
  • 38,575
  • 12
  • 48
  • 84
Marc G.
  • 141
  • 1
  • 9
  • Hi, Take a bit of time and read the tag excerpt before tagging. [tag:dataframes] is for pandas, whereas you need [tag:data.frame] here. Be careful the next time. See this meta post. [Warn \[r\] users from adding \[dataframes\] tag instead of \[data.frame\] tag](http://meta.stackoverflow.com/q/318933) – Bhargav Rao Mar 14 '16 at 14:47

1 Answers1

4

Let's use this example:

df <- data.frame(x1 = c(T, T, F), x2 = c(F, F, F), x3 = c(T, F, T))

I would turn the data.frame into a matrix then compute the crossproduct:

crossprod(data.matrix(df))
#    x1 x2 x3
# x1  2  0  1
# x2  0  0  0
# x3  1  0  2
flodel
  • 87,577
  • 21
  • 185
  • 223