9

I'm trying to use dplyr to calculate grouped correlations, but something is clearly wrong since the code below works only in the console:

require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))
gp = group_by(xx, group)
summarize(gp, cor(a, b))

  group   cor(a, b)
1     1 -0.02073084
2     2  0.12803353
3     3  0.06236264
4     4 -0.06181904

If i use the same code in RStudio, i get:

   cor(a, b)
1 0.02739193

What's happening?

Fernando
  • 7,785
  • 6
  • 49
  • 81
  • I can't replicate this. Can you expand on what you mean by the distinction between "in the console" versus "in RStudio"? As a precaution, I would try again both ways in a fresh session. – joran Jul 29 '14 at 20:24
  • @beginneR Thanks, it works. Can you turn your comment into an answer? – Fernando Jul 29 '14 at 20:38

1 Answers1

23

What you experience is related to having both plyr and dplyr loaded at the same time. Since both packages have summarize functions, there can be conflicts if you don't specify explicitly which package you want to use. For the example data, this means:

require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))

Using dplyr as intended:

gp = group_by(xx, group)
dplyr::summarize(gp, cor(a, b))
#Source: local data frame [4 x 2]
#
#  group   cor(a, b)
#1     1 -0.02073084
#2     2  0.12803353
#3     3  0.06236264
#4     4 -0.06181904

Or using plyr

gp = group_by(xx, group)
plyr::summarize(gp, cor(a, b))
#   cor(a, b)
#1 0.02739193

So either avoid loading both packages or specify the package by using package::function.

talat
  • 68,970
  • 21
  • 126
  • 157
  • How would you produce a correlation matrix for each group, so if set.seed(123) xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400), c=rnorm(400) – spindoctor Apr 28 '15 at 20:40
  • @spindoctor, better to ask that in a separate question – talat Apr 28 '15 at 20:45