As indicated in the previous reply by @LAP you can use the terms()
from these models. However, I would recommend to look at the attr(..., "factors")
and attr(..., "dataClasses")
instead of going to the $model
which requires that the entire model.frame()
is stored in the model. This may or may not be the case. Specifically, when re-fitting multiple models you might want to be able to not store the model frame each time.
So one idea would be to proceed in the following steps:
- Check whether
attr(..., "factors")
has not exactly one column, the you can return FALSE
.
- If there is exactly one factor, you can check the corresponding
attr(..., "dataClasses")
if it is "factor"
/"ordered"
and then return TRUE
, otherwise FALSE
.
R code:
one_factor <- function(object) {
f <- attr(terms(object), "factors")
if(length(f) == 0L || NCOL(f) != 1L) return(FALSE)
d <- attr(terms(object), "dataClasses")
if(d[colnames(f)] %in% c("ordered", "factor")) {
return(TRUE)
} else {
return(FALSE)
}
}
This appears to work well for single-part formula
-based objects.
Dummy data with numeric/factor/ordered trt
:
d1 <- d2 <- d3 <- data.frame(y = log(1:9), x = 1:9, trt = rep(1:3, each = 3))
d2$trt <- factor(d2$trt)
d3$trt <- ordered(d3$trt)
Various formula specifications:
f <- list(
y ~ 1,
y ~ x,
y ~ trt,
y ~ trt + x,
y ~ trt + offset(x),
y ~ trt + x + offset(x),
y ~ trt + offset(as.numeric(trt)),
y ~ factor(trt),
y ~ factor(trt) + offset(x),
y ~ factor(x > as.numeric(trt)),
y ~ interaction(x, trt),
y ~ 0 + trt
)
Expected results for d1
, d2
, and d3
, respectively:
ok1 <- c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)
ok2 <- c(FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)
ok3 <- ok2
Checks for lm
without storing the model frame:
lm1 <- lapply(f, lm, data = d1, model = FALSE)
identical(sapply(lm1, one_factor), ok1)
## [1] TRUE
lm2 <- lapply(f, lm, data = d2, model = FALSE)
identical(sapply(lm2, one_factor), ok2)
## [1] TRUE
lm3 <- lapply(f, lm, data = d3, model = FALSE)
identical(sapply(lm3, one_factor), ok3)
## [1] TRUE
Checks for survreg
(Gaussian) and coxph
. (The latter throws a lot of warnings about non-convergence which is not surprising given the dummy data structure. The checks still work as intended.)
library("survival")
d1$y <- d2$y <- d3$y <- Surv(d1$y + 0.5)
sr1 <- lapply(f, survreg, data = d1)
identical(sapply(sr1, one_factor), ok1)
## [1] TRUE
sr2 <- lapply(f, survreg, data = d2)
identical(sapply(sr2, one_factor), ok2)
## [1] TRUE
sr3 <- lapply(f, survreg, data = d3)
identical(sapply(sr3, one_factor), ok3)
## [1] TRUE
cph1 <- lapply(f, coxph, data = d1)
identical(sapply(cph1, one_factor), ok1)
## [1] TRUE
cph2 <- lapply(f, coxph, data = d2)
identical(sapply(cph2, one_factor), ok2)
## [1] TRUE
cph3 <- lapply(f, coxph, data = d3)
identical(sapply(cph3, one_factor), ok3)
## [1] TRUE
Note: If you have multi-part Formula
-based objects this function might fail and the underlying tests would need to be adapted. Examples for the latter could include count regression models (zeroinfl
, hurdle
), multinomial logit (mlogit
), instrumental variables (ivreg
), heteroscedastic models (vglm
, betareg
, crch
) etc. These might have formulas like y ~ trt | 1
or y ~ trt | trt
or y ~ trt | x
which may or may not still be feasible in your framework.