4

model.matrix returns fewer levels if lower order terms are included with interaction terms. If two-factor variables have na and nb levels, respectively. In a complete model.matrix with interaction terms,

model.matrix(~ A + B + A:B), shouldn't I have (na-1) + (nb-1) + (na*nb-1)?

In the following example, both a and b have three levels each. Together, they have nine levels.

data(mtcars)
a <- as.factor(mtcars$gear)
b <- as.factor(mtcars$cyl)
table (a,b)

   b
a    4  6  8
  3  1  2 12
  4  8  4  0
  5  2  1  2

For model matrix with only interaction term, It has all nine levels.

mod.I <- model.matrix(~ a:b)
colnames(mod.I)
[1] "(Intercept)" "a3:b4"       "a4:b4"       "a5:b4"       "a3:b6"      
[6] "a4:b6"       "a5:b6"       "a3:b8"       "a4:b8"       "a5:b8"  

However, for model.matrix with only one lower order term, it drops levels from other variables too. In this case, b doesn't have term for b = 4.

mod.a <- model.matrix(~ a + a:b)
colnames(mod.a)
[1] "(Intercept)" "a4"          "a5"          "a3:b6"       "a4:b6"      
[6] "a5:b6"       "a3:b8"       "a4:b8"       "a5:b8" 

This is equivalent to complete model.matrix.

mod.ab <- model.matrix(~ a + b + a:b)
colnames(mod.ab)
[1] "(Intercept)" "a4"          "a5"          "b6"          "b8"         
[6] "a4:b6"       "a5:b6"       "a4:b8"       "a5:b8"

I read it has to do with contrast, but, wouldn't contrast operate independently on interaction term? Also, If I want to know the coefficient of a4:b4 with resect to a3:b4, how would I do it?

Shubham Gupta
  • 650
  • 6
  • 18
  • 1
    I think this should be helpful - https://stats.stackexchange.com/questions/147083/what-is-the-baseline-level-in-a-factor-by-factor-interaction – thelatemail Aug 30 '19 at 05:52
  • Thanks, that was helpful. For `mod.a`, can the coefficient of `a4:b4` be calculated as `a4 + Intercept`, assuming Intercept has coefficient of `a3:b4`? How does one assign a significance level to this new coefficient if only one of the `a4` or `Intercept` is significant? – Shubham Gupta Aug 30 '19 at 07:35

1 Answers1

1

You can look at the output of model.matrix to see exactly what it has done in any particular situation but in any case to calculate a full set of coefficients use dummy.coef like this or optionally use the use.na=TRUE argument. See ?dummy.coef

fm <- lm(mpg ~ a + a:b, mtcars)
dummy.coef(fm)

giving:

Full coefficients are 

(Intercept):       21.5                                                                
a:                    3       4       5                                                
                  0.000   5.425   6.700                                                
a:b:                3:4     4:4     5:4     3:6     4:6     5:6     3:8     4:8     5:8
                  0.000   0.000   0.000  -1.750  -7.175  -8.500  -6.450   0.000 -12.800
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341