-2

Let's have some dumb data which say are result I get after I use group by and summarize from dplyr

Name<-rep(c("Pepsi","Cola"),3)
Category<-c("A","A","A","B","B","B")
Value<-1:6
aha<-as.data.frame(cbind(Name,Category,Value))
aha$Value<-as.numeric(as.character(aha$Value))

Our data frame looks like this

   Name Category Value
1 Pepsi        A     1
2  Cola        A     2
3 Pepsi        A     3
4  Cola        B     4
5 Pepsi        B     5
6  Cola        B     6

I want to calculate new column where I get value/sum(value) but condition on category.

E.g. for firstrow its 1/6=0,17 because sum of value with A category is 6.

I found how to do it with plyr but it does not get along with dplyr

Help me out please

Tomas H
  • 713
  • 4
  • 10
  • 3
    Some usefull reading material: [*"R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate"*](http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega) or: [*"Calculate amount of occurences for column per category"*](http://stackoverflow.com/a/36791310/2204410) – Jaap Apr 23 '16 at 12:06

4 Answers4

2

Two alternatives without using extra packages:

# option 1
transform(aha, new = ave(Value, Category, FUN = function(x) x/sum(x)))
# option 2
aha$new <- ave(aha$Value, aha$Category, FUN = function(x) x/sum(x))
h3rm4n
  • 4,126
  • 15
  • 21
1

You can do with dplyr:

aha %>% group_by(Category) %>% mutate(new=Value/sum(Value))

#Source: local data frame [6 x 4]
#Groups: Category [2]

#    Name Category Value       new
#  (fctr)   (fctr) (dbl)     (dbl)
#1  Pepsi        A     1 0.1666667
#2   Cola        A     2 0.3333333
#3  Pepsi        A     3 0.5000000
#4   Cola        B     4 0.2666667
#5  Pepsi        B     5 0.3333333
#6   Cola        B     6 0.4000000
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
0

With data.table

library(data.table)
setDT(aha)[, new := Value/sum(Value) , by = Category]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

One more in base R

aha$new <- unlist(tapply(aha$Value, aha$Category,function(x) x/sum(x)))
G. Cocca
  • 2,456
  • 1
  • 12
  • 13