2

I want to compute a rolling weighted mean per group for a data.table like this:

DT <- data.table(group = rep(c(1,2), each = 5), value = 1:10, weight = 11:20)
   group value weight
 1:     1     1     11
 2:     1     2     12
 3:     1     3     13
 4:     1     4     14
 5:     1     5     15
 6:     2     6     16
 7:     2     7     17
 8:     2     8     18
 9:     2     9     19
10:     2    10     20

I found a working solution with the runner package in this question Rolling over function with 2 vector arguments:

my_weighted_mean <- function(data) {
  weighted.mean(data[, 1], w = data[, 2])
}

DT[, weighted_mean := runner::runner(x = .SD, f = my_weighted_mean , k = 3, na_pad = TRUE), .SDcols = c("value", "weight"), by = list(group)]

But the code is quite slow.

I guess it should work with frollapply but the following doesn't, because I don't understand how to use frollapply with a two column function:

 DT[, weighted_mean := frollapply(value, FUN = weighted.mean, n = 3, w = weights), by = list(group)]

Looking for a better performance (and a solution without runner)

fc9.30
  • 2,293
  • 20
  • 19

2 Answers2

6

"frollapply with a two column function": instead of rolling on the values, roll on the indices, and the internal function can use as many columns as desired.

 DT[, weighted_mean := frollapply(seq_len(.N),
                                  FUN = function(ind) weighted.mean(value[ind], weight[ind]),
                                  n = 3),
    by = .(group)]
#     group value weight weighted_mean
#     <num> <int>  <int>         <num>
#  1:     1     1     11            NA
#  2:     1     2     12            NA
#  3:     1     3     13      2.055556
#  4:     1     4     14      3.051282
#  5:     1     5     15      4.047619
#  6:     2     6     16            NA
#  7:     2     7     17            NA
#  8:     2     8     18      7.039216
#  9:     2     9     19      8.037037
# 10:     2    10     20      9.035088
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Here is another option:

k <- 3L
DT[, v := frollsum(value * weight, k) / frollsum(weight, k)][
    rowid(group) %in% seq(k-1L), v:= NA_real_]

output:

    group value weight        v
 1:     1     1     11       NA
 2:     1     2     12       NA
 3:     1     3     13 2.055556
 4:     1     4     14 3.051282
 5:     1     5     15 4.047619
 6:     2     6     16       NA
 7:     2     7     17       NA
 8:     2     8     18 7.039216
 9:     2     9     19 8.037037
10:     2    10     20 9.035088

Might be faster if you have a lot of groups:

set.seed(0L)
ng <- 1e5
nr <- 1e6
DT <- data.table(group=sample(ng, nr, TRUE), value=rnorm(nr), weight=rnorm(nr))
setkey(DT, group)
DT2 <- copy(DT)
k <- 3L

microbenchmark::microbenchmark(times=3L,
    m0 = DT[, weighted_mean := frollapply(seq_len(.N),
              FUN = function(ind) weighted.mean(value[ind], weight[ind]),
              n = k),
        by = .(group)],
    m1 = DT2[, v := frollsum(value * weight, k) / frollsum(weight, k)][
        rowid(group) %in% seq(k-1L), v:= NA_real_]
)
all.equal(DT$weighted_mean, DT2$v)
#[1] TRUE

timings:

Unit: milliseconds
 expr       min        lq       mean    median         uq      max neval
   m0 5670.6707 5725.5539 5805.01047 5780.4370 5872.18035 5963.924     3
   m1   49.2789   54.5392   59.12413   59.7995   64.04675   68.294     3
chinsoon12
  • 25,005
  • 4
  • 25
  • 35