model.matrix
returns fewer levels if lower order terms are included with interaction terms. If two-factor variables have na and nb levels, respectively. In a complete model.matrix with interaction terms,
model.matrix(~ A + B + A:B)
, shouldn't I have (na-1) + (nb-1) + (na*nb-1)?
In the following example, both a
and b
have three levels each. Together, they have nine levels.
data(mtcars)
a <- as.factor(mtcars$gear)
b <- as.factor(mtcars$cyl)
table (a,b)
b
a 4 6 8
3 1 2 12
4 8 4 0
5 2 1 2
For model matrix with only interaction term, It has all nine levels.
mod.I <- model.matrix(~ a:b)
colnames(mod.I)
[1] "(Intercept)" "a3:b4" "a4:b4" "a5:b4" "a3:b6"
[6] "a4:b6" "a5:b6" "a3:b8" "a4:b8" "a5:b8"
However, for model.matrix with only one lower order term, it drops levels from other variables too. In this case, b doesn't have term for b = 4.
mod.a <- model.matrix(~ a + a:b)
colnames(mod.a)
[1] "(Intercept)" "a4" "a5" "a3:b6" "a4:b6"
[6] "a5:b6" "a3:b8" "a4:b8" "a5:b8"
This is equivalent to complete model.matrix.
mod.ab <- model.matrix(~ a + b + a:b)
colnames(mod.ab)
[1] "(Intercept)" "a4" "a5" "b6" "b8"
[6] "a4:b6" "a5:b6" "a4:b8" "a5:b8"
I read it has to do with contrast, but, wouldn't contrast operate independently on interaction term? Also, If I want to know the coefficient of a4:b4 with resect to a3:b4, how would I do it?