2

I want to classify the rows of a data frame based on a threshold applied to a given numeric reference column. If the reference column has a value below the threshold, then the result is 0, which I want to add to a new column. If the reference column value is over the threshold, then the new column will have value 1 in all consecutive rows with value over the threshold until a new 0 result comes up. If a new reference value is over the threshold then the value to add is 2, and so on.

If we set up the threshold > 2 then an example of what I would like to obtain is:

row reference result
1 2 0
2 1 0
3 4 1
4 3 1
5 1 0
6 6 2
7 8 2
8 4 2
9 1 0
10 3 3
11 6 3
row <- c(1:11)
reference <- c(2,1,4,3,1,6,8,4,1,3,6)
result <- c(0,0,1,1,0,2,2,2,0,3,3)
table <- cbind(row, reference, result)

Thank you!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
MRR
  • 65
  • 3
  • 2
    Do you have a data frame or a matrix? Your example is a matrix. Also, note that `table` is a function in base R. You should avoid naming objects the same as existing functions – Sotos Dec 28 '22 at 15:02
  • Thanks for your answer. Yes, you are right, I forgot to add a capital T, but in any case, this was just an example to illustrate what I need to do with a huge data frame I have. – MRR Dec 29 '22 at 09:00

2 Answers2

2

We can use run-length encoding (rle) for this.

The below assumes a data.frame:

r <- rle(quux$reference <= 2)
r$values <- ifelse(r$values, 0, cumsum(r$values))
quux$result2 <- inverse.rle(r)
quux
#    row reference result result2
# 1    1         2      0       0
# 2    2         1      0       0
# 3    3         4      1       1
# 4    4         3      1       1
# 5    5         1      0       0
# 6    6         6      2       2
# 7    7         8      2       2
# 8    8         4      2       2
# 9    9         1      0       0
# 10  10         3      3       3
# 11  11         6      3       3

Data

quux <- structure(list(row = 1:11, reference = c(2, 1, 4, 3, 1, 6, 8, 4, 1, 3, 6), result = c(0, 0, 1, 1, 0, 2, 2, 2, 0, 3, 3)), row.names = c(NA, -11L), class = "data.frame")
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

As noted in the comments by @Sotos, would consider alternative name for your object.

Since it wasn't clear if data.frame or matrix, assume we have a data.frame df based on your data:

df <- as.data.frame(table)

And have a threshold of 2:

threshold = 2

You can adapt this solution by @flodel:

df$new_result = ifelse(
  x <- reference > threshold, 
  cumsum(c(x[1], diff(x) == 1)), 
  0)
df

In this case, the diff(x) will include a vector, where values of 1 indicate where result should be increased by cumsum (in the sample data, this occurs in rows 3, 6, and 10). These are transitions from FALSE to TRUE (0 to 1), where reference goes from below to above threshold. Note that x[1] is added/combined since the diff values will be 1 element shorter in length.

Using the ifelse, these new incremental values only apply to those where reference exceeds threshold, otherwise set at 0.

Output

   row reference result new_result
1    1         2      0          0
2    2         1      0          0
3    3         4      1          1
4    4         3      1          1
5    5         1      0          0
6    6         6      2          2
7    7         8      2          2
8    8         4      2          2
9    9         1      0          0
10  10         3      3          3
11  11         6      3          3
Ben
  • 28,684
  • 5
  • 23
  • 45