4

If I have a factor variable, say x = factor(c(1, 2, 3)), then I can use model.matrix function to generate a dummy matrix:

model.matrix(~x + 0)

and I will get a matrix like:

  x1 x2 x3
1  1  0  0
2  0  1  0
3  0  0  1

My question is that, if I already have a large dummy matrix, how could I melt it back to a (factor) column?

In another world, is there an inverse function of model.matrix?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Bayes
  • 67
  • 7
  • 1
    assuming your data is named df you can try `apply(df, 1, which.max)`. – Mamoun Benghezal Mar 31 '15 at 12:33
  • @MamounBenghezal that won't work for any other vector than `c(1,2,3)` – David Arenburg Mar 31 '15 at 13:27
  • @DavidArenburg, why is that ? Here is a proof of good faith `set.seed(1); x <- as.factor(sample(5, 10, replace = T)); mat <- model.matrix(~x-1); par <- as.factor(apply(mat, 1, which.max)); identical(par, x) # TRUE`. This seems to work quite good to me. – Mamoun Benghezal Mar 31 '15 at 13:37
  • @MamounBenghezal you are trying too hard, try setting `x <- factor(c(55,3))` or `x <- factor(c(1,1,3)` and then run your code. – David Arenburg Mar 31 '15 at 13:47
  • @DavidArenburg, Ok I understand what you are saying, but this is a label switching issue. Since, it is easy to get back the original levels, by using `levels(par) <- levels(x)`. – Mamoun Benghezal Mar 31 '15 at 14:05
  • @MamounBenghezal the only problem with that methodology is that the OP will need to have `x` too. If they already have the `x`, they can skip that whole process and just use it. It seems to me that OP is looking for a solution when they have *only* the model matrix and they are trying to find the `x`. Otherwise there is no sense in that question whatsoever. – David Arenburg Mar 31 '15 at 19:19
  • 3
    Anyway, it seems like the best solution is `factor(sub("x", "", colnames(modmat)[max.col(modmat)], fixed = TRUE))`. The only problem with it is that you have to know what was the name of the vector that was passed into `model.matrix` (in this case it was `x`) – David Arenburg Mar 31 '15 at 19:31

1 Answers1

3

apply is suitable for this.

I will use caret package's cars data, which has 1-0 data instead of car types in factor format. Let's convert these 5 columns (convertible, coupe, hatchback, sedan, wagon) to single factor variable, Type.

library(caret)
data(cars)
head(cars[,-c(1:13)])

  convertible coupe hatchback sedan wagon
1           0     0         0     1     0
2           0     1         0     0     0
3           1     0         0     0     0
4           1     0         0     0     0
5           1     0         0     0     0
6           1     0         0     0     0


cars$Type = as.factor(apply(df,1,function(foo){return(names(df)[which.max(foo)])}))

head(cars[,-c(1:13)])

  convertible coupe hatchback sedan wagon        Type
1           0     0         0     1     0       sedan
2           0     1         0     0     0       coupe
3           1     0         0     0     0 convertible
4           1     0         0     0     0 convertible
5           1     0         0     0     0 convertible
6           1     0         0     0     0 convertible
Özgür
  • 8,077
  • 2
  • 68
  • 66