11

I'm looking for a Tidyverse / broom solution that can solve this puzzle:

Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs. I know I can use something like for i in or apply family, but I really want to run that using tidyverse.

The following code works as an example

ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
                 happiness = rnorm(100, mean = 6, sd=1),
                 health = rnorm(100, mean=20, sd = 3),
                 sex = c(0,1),
                 faculty = c(0,1,2,3))

mod1 <- lm(income ~ sex + faculty, ds)
mod2 <- lm(happiness ~ sex + faculty, ds)
mod3 <- lm(health ~ sex + faculty, ds)
summary(mod1)
summary(mod2)
summary(mod3)

Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions.

That was the closest I found

Let me know If I need to clarify my question. Thanks.

Calum You
  • 14,687
  • 4
  • 23
  • 42
Luis
  • 1,388
  • 10
  • 30

3 Answers3

11

As you have different dependent variables but the same independent, you can form a matrix of these and pass to lm.

mod = lm(cbind(income, happiness, health) ~ sex + faculty, ds)

And I think broom::tidy works

library(broom)
tidy(mod)

#    response        term      estimate  std.error  statistic      p.value
# 1    income (Intercept) 1019.35703873 31.0922529 32.7849205 2.779199e-54
# 2    income         sex  -54.40337314 40.1399258 -1.3553431 1.784559e-01
# 3    income     faculty   19.74808081 17.9511206  1.1001030 2.740100e-01
# 4 happiness (Intercept)    5.97334562  0.1675340 35.6545278 1.505026e-57
# 5 happiness         sex    0.05345555  0.2162855  0.2471528 8.053124e-01
# 6 happiness     faculty   -0.02525431  0.0967258 -0.2610918 7.945753e-01
# 7    health (Intercept)   19.76489553  0.5412676 36.5159396 1.741411e-58
# 8    health         sex    0.32399380  0.6987735  0.4636607 6.439296e-01
# 9    health     faculty    0.10808545  0.3125010  0.3458723 7.301877e-01
user20650
  • 24,654
  • 5
  • 56
  • 91
  • Nice option. I forgot about this – akrun Aug 01 '18 at 21:15
  • 1
    Thanks Akrun ... this may be my first tidyverse tagged Q ;| – user20650 Aug 01 '18 at 21:16
  • 3
    You could make it `tidyverse` with `%>%` :=) – akrun Aug 01 '18 at 21:16
  • Wow, amazing solution. @user20650 My real puzzle has different variable names and I want to know if something like this could work: j <- ds %>% select(income, happiness, health) mod = lm(cbind(noquote(paste0(variable.names(j), collapse=","))) ~ sex + faculty, ds) – Luis Aug 01 '18 at 22:23
  • Luis ; sorry - you're asking the wrong person ;)) I don't use tidyverse syntax, perhaps@akrun can advise. – user20650 Aug 01 '18 at 22:33
  • @user20650, you did an amazing job replying to my question! Thanks much! (I could run what I was looking for: j <- ds %>% select(income, happiness, health) %>% names() fit <- lm(eval(parse(text=paste("cbind(", paste(j, collapse=","), ")"))) ~ sex + faculty, ds) fit) – Luis Aug 01 '18 at 22:45
  • you're welcome : i suppose the non-tidy way i 'd do it is using `vars <- c("income", "happiness", "health") ; lm(as.matrix(ds[, vars]) ~ sex + faculty, ds)` and then just change `vars` vector - but I read these days that quotes are bad – user20650 Aug 01 '18 at 22:49
7

Another method is to gather the dependent variables and use a grouped data frame to fit the models with do. This is the method explained in the broom and dplyr vignette.

library(tidyverse)
library(broom)
ds <- data.frame(
  income = rnorm(100, mean = 1000, sd = 200),
  happiness = rnorm(100, mean = 6, sd = 1),
  health = rnorm(100, mean = 20, sd = 3),
  sex = c(0, 1),
  faculty = c(0, 1, 2, 3)
)
ds %>%
  gather(dv_name, dv_value, income:health) %>%
  group_by(dv_name) %>%
  do(tidy(lm(dv_value ~ sex + faculty, data = .)))
#> # A tibble: 9 x 6
#> # Groups:   dv_name [3]
#>   dv_name   term         estimate std.error statistic  p.value
#>   <chr>     <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 happiness (Intercept)     6.25      0.191    32.7   3.14e-54
#> 2 happiness sex             0.163     0.246     0.663 5.09e- 1
#> 3 happiness faculty        -0.172     0.110    -1.56  1.23e- 1
#> 4 health    (Intercept)    20.1       0.524    38.4   1.95e-60
#> 5 health    sex             0.616     0.677     0.909 3.65e- 1
#> 6 health    faculty        -0.653     0.303    -2.16  3.36e- 2
#> 7 income    (Intercept)  1085.       32.8      33.0   1.43e-54
#> 8 income    sex           -12.9      42.4      -0.304 7.62e- 1
#> 9 income    faculty       -25.1      19.0      -1.32  1.89e- 1

Created on 2018-08-01 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
5

We can loop through the column names that are dependent variables, use paste to create the formula to be passed into lm and get the summary statistics with tidy (from broom)

library(tidyverse)
library(broom)
map(names(ds)[1:3], ~ 
            lm(formula(paste0(.x, "~", 
                paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
               tidy)

If we want it in a single data.frame with a column identifier for dependent variable,

map_df(set_names(names(ds)[1:3]), ~ 
      lm(formula(paste0(.x, "~", 
        paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
   tidy, .id = "Dep_Variable")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks much, @akrun. You are always replying to my questions. Would you mind if I ask you how do you know so much about tidyverse and R? Are you a data scientist? What books/material you recommend? (I always learn by trial and error as you can see =)) ) Thanks much! – Luis Aug 01 '18 at 22:59
  • 1
    @Luis Glad to answer your question. I learned/learning R mostly through SO and R mailing list. Your method of learning is good because getting error makes you think about the reason and it makes a deeper impression than going through the correct method. There are lots of resources online to learn R. May be you have gone through these [links](https://stackoverflow.com/questions/4556524/whats-the-way-to-learn-r). Also, you can check for courses from udemy, udacity, datacamp etc. There is no particular method that fits for all persons – akrun Aug 02 '18 at 06:03
  • 1
    Thanks again, @akrun, your replies are always very useful and informative. Best! – Luis Aug 02 '18 at 15:10