1

I have a dataset similar to:

Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1

I want to add a new column which will record "Off" when a score difference drops 3 points, until there's a gain of +3 points, which will then record "On" until there's a drop.

Example:

Name, Day, Score, Diff, OnOff
Jain, 1, 8, 0, "Off"
Jain, 2, 6, -2, "Off"
Jain, 3, 8, 2, "Off"
Jain, 4, 12, 4, "On"
Jain, 5, 13, 1, "On"
Jain, 6, 6, -7, "Off"
Matt, 1,4, 0, "Off"
Matt, 2, 10, 6, "On"
Matt, 3, 11, 1, "On"
Matt, 4, 12, 1, "On"
Matt, 5, 5, -7, "Off"
Matt, 6, 6, 1, "Off"

Can't seem to figure out how to code this one. I've attempted with the following:

df$OnOff <- ifelse(df$Diff >= 3, "On", ifelse(df$Diff <= -3, "Off", ""))
df$OnOff <- ifelse(df$OnOff == "", lag(df$OnOff), df$OnOff)
Fredrik Widerberg
  • 3,068
  • 10
  • 30
  • 42
Jay
  • 13
  • 2
  • 1
    Based on the first 3 rows, why it is `Off`, there is no absolute drop to 3 points – akrun Jul 06 '18 at 18:27
  • I'd like it to assume it's "Off" until there's a +3 gain somewhere to trigger it to "On" – Jay Jul 06 '18 at 18:32

3 Answers3

2

Here's another tidyverse solution using fill:

library(tidyverse)
df %>%
  mutate(
    OnOff = case_when(
      1:n() == 1 ~ 'Off',
      Diff < -2 ~  "Off",
      Diff >2 ~ "On",
      TRUE ~ NA_character_)
  ) %>%
  fill(OnOff)

doing it by name:

df %>%
  group_by(Name) %>%
  mutate(
    OnOff = case_when(
      1:n() == 1 ~ 'Off',
      Diff < -2 ~  "Off",
      Diff >2 ~ "On",
      TRUE ~ NA_character_)
  ) %>%
  fill(OnOff)
Melissa Key
  • 4,476
  • 12
  • 21
  • 1
    I think you don't need the temporary `row` column, you can just use `1:n() == 1 ~ 'Off'` directly. Saves the trouble of cleaning it up afterwards. – Gregor Thomas Jul 06 '18 at 18:41
2

Put in the changes, then use zoo::na.locf (or similar) to fill in the blanks. Calling your data dd:

dd$OnOff = NA
dd$OnOff[1] = "off"
dd$OnOff[dd$Diff >= 3] = "on"
dd$OnOff[dd$Diff <= -3] = "off"
dd$OnOff = zoo::na.locf(dd$OnOff)
dd
#    Name Day Score Diff OnOff
#  1: Jain   1     8    0   off
#  2: Jain   2     6   -2   off
#  3: Jain   3     8    2   off
#  4: Jain   4    12    4    on
#  5: Jain   5    13    1    on
#  6: Jain   6     6   -7   off
#  7: Matt   1     4    0   off
#  8: Matt   2    10    6    on
#  9: Matt   3    11    1    on
# 10: Matt   4    12    1    on
# 11: Matt   5     5   -7   off
# 12: Matt   6     6    1   off

You don't mention grouping in the question, but you can use dplyr or data.table to do the locf by Name if needed.

To do things by name, you'll need to set the first row of each name to the default 'off'. See Melissa's solution for a dplyr method. With data.table it looks like this:

setdt(dd)
dd[, OnOff := c('off', rep(NA, .N - 1)), by = Name]
dd[Diff >= 3, OnOff := "on"]
dd[Diff <= -3, OnOff := "off"]
dd[, OnOff := zoo::na.locf(OnOff), by = Name]

Using this data:

dd = data.table::fread("Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1")
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • How would I add in the by name for this? And thank you this worked perfectly! :) – Jay Jul 06 '18 at 18:44
  • Edited to include the "by name" code for `data.table`. See Melissa's answer for `dplyr` equivalent. – Gregor Thomas Jul 06 '18 at 18:53
  • 1
    For a zoo-less data.table option, the last line can change to `dd[, OnOff := first(OnOff), by=rleid(Name, cumsum(!is.na(OnOff)))]`, I think. – Frank Jul 06 '18 at 18:57
  • 1
    @Frank interesting and clever, but much less clear and maybe slower? Guess it depends how much you hate extra dependencies. – Gregor Thomas Jul 06 '18 at 19:11
  • Heh, yeah, I do this sort of thing (or even a rolling join `dd[, OnOff := dd[!is.na(OnOff), .(Name, Day, OnOff)][.SD, on=.(Name, Day), roll = TRUE, x.OnOff]]` https://stackoverflow.com/a/26181795), but don't know that I'd recommend it to everyone. – Frank Jul 06 '18 at 19:21
1

One can write a simple function that traverse on Diff to compare value in order to switch between On and Off as:

#Function to decide On/Off logic
getOnOff <- function(x){
  lstVal <- "Off"
  value <- rep(NA,length(x))
  for(i in seq_along(x)){
    if(x[i] >= 3){
      lstVal = "On"
    }else if(x[i] <= -3){
      lstVal = "Off"
    }
    value[i] <- lstVal
  }
  value
}

#Now use the function with `dplyr` to after grouping on Name

library(dplyr)

df %>% group_by(Name) %>%
  mutate(OnOff = getOnOff(Diff))


# # A tibble: 12 x 5
# # Groups: Name [2]
# Name    Day Score  Diff OnOff
# <chr> <int> <int> <int> <chr>
# 1 Jain      1     8     0 Off  
# 2 Jain      2     6    -2 Off  
# 3 Jain      3     8     2 Off  
# 4 Jain      4    12     4 On   
# 5 Jain      5    13     1 On   
# 6 Jain      6     6    -7 Off  
# 7 Matt      1     4     0 Off  
# 8 Matt      2    10     6 On   
# 9 Matt      3    11     1 On   
# 10 Matt      4    12     1 On   
# 11 Matt      5     5    -7 Off  
# 12 Matt      6     6     1 Off  

Option#2: Probably OP has not meant to switch on absolute count of different condition but if that is needed then one can try using cumsum with dplyr. The occurrence of Diff >= 3 means count goes up and Diff <= -3 means count goes down. The cumulative sum of these will give relative count on which On/Off can be decided.

library(dplyr)

df %>% mutate(OnOff = ifelse(cumsum(Diff >= 3) - (cumsum(Diff<= -3))>0, "On","Off"))

#    Name Day Score Diff OnOff
# 1  Jain   1     8    0   Off
# 2  Jain   2     6   -2   Off
# 3  Jain   3     8    2   Off
# 4  Jain   4    12    4    On
# 5  Jain   5    13    1    On
# 6  Jain   6     6   -7   Off
# 7  Matt   1     4    0   Off
# 8  Matt   2    10    6    On
# 9  Matt   3    11    1    On
# 10 Matt   4    12    1    On
# 11 Matt   5     5   -7   Off
# 12 Matt   6     6    1   Off
# 

Data:

df <- read.table(text="
Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1",
header = TRUE, stringsAsFactors = FALSE, sep = ",")
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 2
    I guess the condition is wrong. If I meet one of the criteria a couple times in a row, the other criterion won't stop the sequence. Eg, `data.frame(Diff = c(3, 3, -3, 3)) %>% mutate(OnOff = ifelse(cumsum(Diff >= 3) - (cumsum(Diff<= -3))>0, "On","Off"))` should have Off on row 3 – Frank Jul 06 '18 at 18:36
  • 1
    @Frank I agree with comment. Let me try to look for a correction. – MKR Jul 06 '18 at 18:38
  • @Gregor Sorry, I didn't notice your advice. I agree `fill` based solutions could be appropriate choice but I have added an `custom function` based solution as an option. – MKR Jul 06 '18 at 21:02