4

In glmnet() I have to specify the raw X matrix and response vector Y (different than lm where you can specify the model formula). model.matrix() will correctly remove incomplete observations from the X matrix, but it doesn't include the response in the output object. So I will have something like this:

mydf
glmnet(y = mydf$response, x = model.matrix(myformula, mydf)[,-1], ...)

When model.matrix removes observations the y and x dimensions won't match. Is there a function to align y data to x?

Robert Kubrick
  • 8,413
  • 13
  • 59
  • 91

1 Answers1

2

Try using model.frame and model.response.

> d <- data.frame(y=rnorm(3), x=c(1,NA,2), z=c(NA, NA, 1))
> d
           y  x  z
1 -0.6257260  1 NA
2 -0.4979723 NA NA
3 -1.2233772  2  1
> form <- y~x
> mf <- model.frame(form, data=d)
> model.response(mf)
        1         3
-0.625726 -1.223377
> model.matrix(form, mf)
  (Intercept) x
1           1 1
3           1 2
attr(,"assign")
[1] 0 1

I'm not familiar with glmnet, it might be the case that mf is sufficient, just passing y=mf[1,] and x=mf[-1,].

Josh
  • 1,248
  • 12
  • 25