0

Probably it's a simple question and I dunno whats the term I have to search but I haven't find anything yet.

I have a data.table (I'm used to it) or data.frame like this:

a = rep(1, 8)
b = rep(c("20","30", "180", "10"), each = 1)
df = data.table(a,b)

df

  a   b
1 1  20
2 1  30
3 1 180
4 1  10
5 1  20
6 1  30
7 1 180
8 1  10

I want to obtain a variable that identify the group of observations that appear between the first and last observation with a value of b higher or equally to 180. Like this:

  a   b  c
1 1  20  1
2 1  30  1
3 1 180  2
4 1  10  2
5 1  20  2
6 1  30  2
7 1 180  3
8 1  10  3

Any help wil be very helpful! If there's an existing question like this anywhere I will appreciate if you tell me were to look.

Thanks in advance :)

zx8754
  • 52,746
  • 12
  • 114
  • 209

1 Answers1

2

We can use cumsum after changing the class of the column.

library(data.table)

df <- type.convert(df)
setDT(df)

df[, c := cumsum(b >= 180) + 1]


#   a   b c
#1: 1  20 1
#2: 1  30 1
#3: 1 180 2
#4: 1  10 2
#5: 1  20 2
#6: 1  30 2
#7: 1 180 3
#8: 1  10 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    b column is character. – zx8754 Mar 17 '20 at 12:43
  • Fixed... @AllanCameron and zx8754. – Ronak Shah Mar 17 '20 at 12:45
  • Wow! So fast and so helpful Just if anyone has the same problem as me. If you have NA's in your variable you have to avoid NA's or convert NA's to 0's like in the [link][1] Thank you all! :D [1]: https://stackoverflow.com/questions/25576358/calculate-cumsum-while-ignoring-na-values – SaveTheDream Mar 17 '20 at 16:16