3

Take the following example:

fit <- lm(Sepal.Length ~ log(Sepal.Width), data = iris)

I would like a copy of iris that only includes the variables that were involved in making fit. I think model.matrix() or model.frame() don't quite do it because of the log; they will include log(Sepal.Width) but not Sepal.Width. I want basically a minimal version of iris that only includes variables that were used in making fit. How can I do that? This of course is an example and I would like a more general solution (say you had a number of variables used in making a fit, many passed through transformations that are not necessarily invertible).

cgmil
  • 410
  • 2
  • 18

1 Answers1

4

I think what you want is get_all_vars()

get_all_vars(fit, data = iris)

Output:

#    Sepal.Length Sepal.Width
#1            5.1         3.5
#2            4.9         3.0
#3            4.7         3.2
#4            4.6         3.1
#5            5.0         3.6
#6            5.4         3.9
#7            4.6         3.4
# ...

This returns untransformed variables (ie, Sepal.Width instead of log(Sepal.Width), as seen here:

all.equal(iris$Sepal.Width, 
          get_all_vars(fit, data = iris)$Sepal.Width)

#[1] TRUE
jpsmith
  • 11,023
  • 5
  • 15
  • 36