11

I can't figure out what the underscore-based function is for the group_by_() function.

From the group_by help:

by_cyl <- group_by(mtcars, cyl)  
summarise(by_cyl, mean(disp), mean(hp))  

yields the expected:

Source: local data frame [3 x 3]  
    cyl mean(disp)  mean(hp)
1   4   105.1364  82.63636
2   6   183.3143 122.28571
3   8   353.1000 209.21429

but this:

by_cyl <- group_by_(mtcars, cyl)  

yields an error:

"Error in as.lazy_dots(list(...)) : object 'cyl' not found"  

So my question is what does the underscore version do? And also, under what circumstances would I want to use it, rather than the "regular" one?

Thanks

hackR
  • 1,459
  • 17
  • 26
  • 2
    You could define `cyl` in another variable and pass it with `group_by_`. `someVar <- 'cyl'; by_cyl <- group_by_(mtcars, someVar)` – akrun Feb 23 '15 at 04:49
  • 7
    Reading the [Non-standard evaluation](http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html) `dplyr` package vignette would be a good place to start. – hrbrmstr Feb 23 '15 at 04:53
  • 2
    `by_cyl <- group_by_(mtcars, "cyl")` will work (as commented by @akrun) – Ben Bolker Feb 23 '15 at 04:58
  • 2
    @akrun. I think I understand now. To rephrase your answer: rather than having the `group_by()` variable hard-coded, it can be calculated on-the-fly, thereby allowing the programmer to use any of the data.frame's columns at runtime, if desired. Thanks! – hackR Feb 23 '15 at 05:50

1 Answers1

20

The dplyr Non-Standard Evaluation vignette helps here: http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

Note: the above link is now out of date, but the same information can be found on the github page for the package. https://github.com/tidyverse/dplyr/blob/34423af89703b0772d59edcd0f3485295b629ab0/vignettes/nse.Rmd

Dplyr uses non-standard evaluation (NSE) in all of the most important single table verbs: filter(), mutate(), summarise(), arrange(), select() and group_by(). NSE is important not only to save you typing, but for database backends, is what makes it possible to translate your R code to SQL. However, while NSE is great for interactive use it’s hard to program with. This vignette describes how you can opt out of NSE in dplyr, and instead rely only on SE (along with a little quoting).

...

Every function in dplyr that uses NSE also has a version that uses SE. There’s a consistent naming scheme: the SE is the NSE name with _ on the end. For example, the SE version of summarise() is summarise_(), the SE version of arrange() is arrange_(). These functions work very similarly to their NSE cousins, but the inputs must be “quoted”

Community
  • 1
  • 1
r.bot
  • 5,309
  • 1
  • 34
  • 45
  • that link is broken – val Nov 07 '17 at 11:55
  • 7
    As a python user, reading all these answers for coding in R confuses me quite a lot. Why do R programmers on StackOverflow 1. Not explain with clear examples (which is seen a lot in python answers). 2. Give links to documentation instead of explaining code in simple terms. I've seen this quite a lot. Can anyone correct me on why I see this trend? I do see some R users trying to follow good methods, but they cover only 1-2% of the major Questions and answers that I have come across on StackOverflow. thanks. – nikpod Dec 18 '17 at 14:40