-1

I have an issue with a cumsum for a vector that starts with an NA. Do you have an idea how to transform the first NA to 0 or just skip the first one?

Here is my code :

Symbol <- c("AA", "AA", "AA", "AA", "AA", "AA", "AA","AA","AA","AA", "AA", "AA", "AA", "AA", "AA", "AA","AA","AA","AA") 
days <- c(3, 10, 29,13,14,29,19,1,4,3, 10, 29,13,14,29,19,1,4,7) 
month <- c(1, 1, 5,7,1,2,5,7,9,1, 1, 5,7,1,2,5,7,9,12)
years <- c(2014,2014,2015,2015,2016,2016,2016,2016,2016,2014,2014,2015,2015,2016,2016,2016,2016,2016,2016) 
price <- c(10,20,15,14,16,17,9,14,12,14,15,18,5,10,6,18,12,8, 14)
shares <- c(100,50,-30,400,-200,-100,-100,-150,-120,-100,-50,-100,70,-70,190,250,50,120,150)


    df <- data.frame(Symbol,days,month,years,price,shares)  
    df %>% 
    mutate(value = shares*price,
     cum_shares = cumsum(shares),
     cum_value = ifelse(lag(cum_shares != 0), 
                        ifelse(lag(cum_shares) > 0 & cum_shares < 0 | lag(cum_shares) > 0 & cum_shares < 0,
                               cum_shares*price,
                               cumsum(value)),
                         value),
     Av_price = ifelse(is.infinite(cum_value/cum_shares),
                       0,
                       cum_value/cum_shares),

     Profit = ifelse(cum_shares >= 0 & shares<0,
                 ifelse(lag(cum_shares) > 0 & cum_shares < 0, #YES
                       (lag(cum_shares)*Av_price) - (lag(cum_shares)*lag(Av_price)),  #yes 
                       (lag(Av_price)*shares)-value), #no
                 ifelse(cum_shares <= 0 & shares>0, #NOP
                    ifelse(lag(cum_shares) > 0 & cum_shares < 0, #yes
                          (lag(cum_shares)*Av_price) - (lag(cum_shares)*lag(Av_price)),
                          lag(Av_price)*shares-value),
                    ifelse(cum_shares <= 0 & shares < 0, #no
                           ifelse(lag(cum_shares) > 0 & cum_shares < 0,
                                  (-lag(cum_shares)*lag(Av_price)) + (lag(cum_shares)*price),
                                  0),
                            ifelse(cum_shares>0 & shares > 0,
                                   ifelse(lag(cum_shares) <0 & shares >0,  
                                        (-lag(cum_shares)*lag(Av_price)) + (lag(cum_shares)*price),
                                        0),
                                   0))
                                  )),

     cum_profit = cumsum(Profit))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Ben2pop
  • 746
  • 1
  • 10
  • 26

2 Answers2

1

dplyr's lag has a default argument which can be set to specify the value to be filled in. For example, using the built-in data frame BOD:

BOD %>% mutate(Lag = lag(Time, default = 0), Cum = cumsum(Lag))

giving:

  Time demand Lag Cum
1    1    8.3   0   0
2    2   10.3   1   1
3    3   19.0   2   3
4    4   16.0   3   6
5    5   15.6   4  10
6    7   19.8   5  15

or if you want to show Lag with an NA but have 0 in cumsum:

BOD %>% mutate(Lag = lag(Time), Cum = cumsum(lag(Time, default = 0)))

giving:

  Time demand Lag Cum
1    1    8.3  NA   0
2    2   10.3   1   1
3    3   19.0   2   3
4    4   16.0   3   6
5    5   15.6   4  10
6    7   19.8   5  15
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

For a vector x

# Skip the first element:
x[-1]

# Replace the first element with 0
c(0, x[-1])

# Remove all missing values
na.omit(x)   # adds attributes, which can be annoying
x[!is.na(x)]

Any of these can be wrapped in cumsum()

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294