Group variable that identifies observations between two values

Question

Probably it's a simple question and I dunno whats the term I have to search but I haven't find anything yet.

I have a data.table (I'm used to it) or data.frame like this:

a = rep(1, 8)
b = rep(c("20","30", "180", "10"), each = 1)
df = data.table(a,b)

df

  a   b
1 1  20
2 1  30
3 1 180
4 1  10
5 1  20
6 1  30
7 1 180
8 1  10

I want to obtain a variable that identify the group of observations that appear between the first and last observation with a value of b higher or equally to 180. Like this:

Any help wil be very helpful! If there's an existing question like this anywhere I will appreciate if you tell me were to look.

Thanks in advance :)

Something like `cumsum(df$b == 180) + 1` – zx8754 Mar 17 '20 at 12:40 — zx8754, Mar 17 '20 at 12:40

Ronak Shah · Accepted Answer · 2020-03-17T12:44:59.707

2

We can use cumsum after changing the class of the column.

library(data.table)

df <- type.convert(df)
setDT(df)

df[, c := cumsum(b >= 180) + 1]


#   a   b c
#1: 1  20 1
#2: 1  30 1
#3: 1 180 2
#4: 1  10 2
#5: 1  20 2
#6: 1  30 2
#7: 1 180 3
#8: 1  10 3

edited Mar 17 '20 at 12:44

answered Mar 17 '20 at 12:40

Ronak Shah

377,200
20
156
213

1

b column is character. – zx8754 Mar 17 '20 at 12:43
Fixed... @AllanCameron and zx8754. – Ronak Shah Mar 17 '20 at 12:45
Wow! So fast and so helpful Just if anyone has the same problem as me. If you have NA's in your variable you have to avoid NA's or convert NA's to 0's like in the [link][1] Thank you all! :D [1]: https://stackoverflow.com/questions/25576358/calculate-cumsum-while-ignoring-na-values – SaveTheDream Mar 17 '20 at 16:16

Group variable that identifies observations between two values

1 Answers1