2

How to conditionally compare data rows by rows and output different results to other columns?

Please see below dataset,

The den of Row 1 is 1, start comparing the Weight of each row with the Weight of Row 1, compare the Volume of each row with the Volume of Row 1.

Check first if the Weight of certain Row is higher than the Weight of Row 1, the higher column of Row 1 will become 1, else check if the Volume of certain Row is lower than the Volume of Row 1 by 1.0, the lower column of Row 1 will become 1.

Until either of these conditions is met, keep comparing the next row and the next row....If either of the condition is met with Row 2, move on to Row 3, if either of the conditions is met with Row 3, move on to Row 4, row by row.....so on.

When one of the condition is met(one of the higher or lower column of Row 1 == 1), move on to the next Row where den==1 which is Row 3 in this case. And then Row 6.

The howhigh column is to record the difference between a row's Weight with Row 1's Weight when Row 1's higher == 1. The between column is to record the Row difference where the condition was met(eg: in Expected Outcome , Row 1's between is 5 because the condition was met with Row 6, so 6 - 1 = 5, the between of Row 3 is 3 because condition was met with Row 6 so 6 - 3 = 3)

Then the dataset will become something like the Expected Outcome

Take Row 14 from Expected Outcome for example, the higher==1 because the Weight from Row 18 is higher. The howhigh is 0.0649 because the Weight difference of Row 14 and 18 is 0.0649, the between is 4 because 18-14=4

how do I achieve this the vectorized way to increase calculation speed? Thanks in advance.

dataset

   Weight Volume den higher lower between howhigh
1  5.1626 5.1594   1      0     0       0       0
2  5.1615 5.1559   0      0     0       0       0
3  5.1600 5.1574   1      0     0       0       0
4  5.1593 5.1582   0      0     0       0       0
5  5.1592 5.1572   0      0     0       0       0
6  5.1635 5.1580   1      0     0       0       0
7  5.1608 5.1580   0      0     0       0       0
8  5.1602 4.0565   0      0     0       0       0
9  5.1582 5.1554   0      0     0       0       0
10 5.1563 5.1547   0      0     0       0       0
11 5.1578 5.1550   1      0     0       0       0
12 5.1589 5.1560   0      0     0       0       0
13 5.1578 3.1553   0      0     0       0       0
14 5.1591 5.1554   1      0     0       0       0
15 5.1585 5.1563   0      0     0       0       0
16 5.1572 5.1557   0      0     0       0       0
17 5.1565 5.1520   0      0     0       0       0
18 5.2240 5.1518   0      0     0       0       0
19 5.1540 5.1505   1      0     0       0       0
20 5.1539 5.1488   0      0     0       0       0
21 5.1520 5.1408   0      0     0       0       0
22 5.1450 5.1420   0      0     0       0       0
23 5.1455 5.1420   0      0     0       0       0
24 5.1461 5.1435   0      0     0       0       0
25 5.1470 5.1437   0      0     0       0       0
26 5.1449 5.1378   0      0     0       0       0
27 5.1423 5.1385   0      0     0       0       0
28 6.1429 5.1401   0      0     0       0       0
29 5.1425 5.1399   0      0     0       0       0
30 5.1433 5.1403   1      0     0       0       0

Expected Outcome

   Weight Volume den higher lower between howhigh
1  5.1626 5.1594   1      1     0       5  0.0009
2  5.1615 5.1559   0      0     0       0       0
3  5.1600 5.1574   1      1     0       3  0.0035    
4  5.1593 5.1582   0      0     0       0       0
5  5.1592 5.1572   0      0     0       0       0
6  5.1635 5.1580   1      0     1       2       0
7  5.1608 5.1580   0      0     0       0       0
8  5.1602 4.0565   0      0     0       0       0
9  5.1582 5.1554   0      0     0       0       0
10 5.1563 5.1547   0      0     0       0       0
11 5.1578 5.1550   1      0     1       2       0
12 5.1589 5.1560   0      0     0       0       0
13 5.1578 3.1553   0      0     0       0       0
14 5.1591 5.1554   1      1     0       4  0.0649
15 5.1585 5.1563   0      0     0       0       0
16 5.1572 5.1557   0      0     0       0       0
17 5.1565 5.1520   0      0     0       0       0
18 5.2240 5.1518   0      0     0       0       0
19 5.1540 5.1505   1      1     0       9  0.9889
20 5.1539 5.1488   0      0     0       0       0
21 5.1520 5.1408   0      0     0       0       0
22 5.1450 5.1420   0      0     0       0       0
23 5.1455 5.1420   0      0     0       0       0
24 5.1461 5.1435   0      0     0       0       0
25 5.1470 5.1437   0      0     0       0       0
26 5.1449 5.1378   0      0     0       0       0
27 5.1423 5.1385   0      0     0       0       0
28 6.1429 5.1401   0      0     0       0       0
29 5.1425 5.1399   0      0     0       0       0
30 5.1433 5.1403   1      0     0       0       0
Intern Ne
  • 65
  • 7

1 Answers1

2

I took a stab at this. Let me know how the speed is as it's not 100% a vectorized solution. It took me a while to understand you only wanted to look at rows below den and that, if volume was lower, you did not mean lower by exactly 1.0 but by equal to or less than 1.0.

# Your data
dat <- structure(list(Weight = c(5.1626, 5.1615, 5.16, 5.1593, 5.1592, 5.1635, 5.1608, 5.1602, 5.1582, 5.1563, 5.1578, 5.1589, 5.1578, 5.1591, 5.1585, 5.1572, 5.1565, 5.224, 5.154, 5.1539, 5.152, 5.145, 5.1455, 5.1461, 5.147, 5.1449, 5.1423, 6.1429, 5.1425, 5.1433), Volume = c(5.1594, 5.1559, 5.1574, 5.1582, 5.1572, 5.158, 5.158, 4.0565, 5.1554, 5.1547, 5.155, 5.156, 3.1553, 5.1554, 5.1563, 5.1557, 5.152, 5.1518, 5.1505, 5.1488, 5.1408, 5.142, 5.142, 5.1435, 5.1437, 5.1378, 5.1385, 5.1401, 5.1399, 5.1403), den = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), higher = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), lower = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), between = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), howhigh = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Weight", "Volume", "den", "higher", "lower", "between", "howhigh"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))

I add a rownumber to the data.frame for more ease of access in apply, then I take only rows with den == 1 to create a new variable for looping over.

dat$rownum <- 1:nrow(dat)
newd <- dat[dat$den == 1,]
#   Weight Volume den higher lower between howhigh rownum
#1  5.1626 5.1594   1      1     0       0       0      1
#3  5.1600 5.1574   1      1     0       0       0      3
#6  5.1635 5.1580   1      1     0       0       0      6
#11 5.1578 5.1550   1      1     0       0       0     11
#14 5.1591 5.1554   1      1     0       0       0     14
#19 5.1540 5.1505   1      1     0       0       0     19
#30 5.1433 5.1403   1      1     0       0       0     30

The Function:

out <- t(apply(newd, 1, function(d){
  rownum <- d["rownum"]
  a <- which(dat$Weight > d["Weight"])
  a <- a[a > rownum][1]
  b <- which((dat$Volume - d["Volume"]) <= -1.0)
  b <- b[b > rownum][1]
  pick <- ifelse(!is.na(b), ifelse(a < b, "a", "b"), "a")
  if( pick == "a"){
    d["higher"] <- 1
    d["howhigh"] <- dat$Weight[a] - d["Weight"]
    d["between"] <- a - rownum
  } else {
    d["lower"] <- 1
    d["between"] <- b - rownum
  }
  d[is.na(d)] <- 0
  d
}))
out
#   Weight Volume den higher lower between howhigh rownum
#1  5.1626 5.1594   1      1     0       5  0.0009      1
#3  5.1600 5.1574   1      1     0       3  0.0035      3
#6  5.1635 5.1580   1      0     1       2  0.0000      6
#11 5.1578 5.1550   1      1     0       1  0.0011     11
#14 5.1591 5.1554   1      1     0       4  0.0649     14
#19 5.1540 5.1505   1      1     0       9  0.9889     19
#30 5.1433 5.1403   1      1     0       0  0.0000     30

dat[dat$den == 1,] <- out # replace old rows with new ones
dat[,-8] # remove the rownum column
Evan Friedland
  • 3,062
  • 1
  • 11
  • 25
  • Thanks for the attempt. Speed wise, it's faster than my loops. The out come is different though. If `higher` is already `1`, skip calculating `lower`. If `lower` is 1, skip calculating `higher`. There can only be one `1` in column `higher` & `lower`. In your outcome, row 6 , the `higher` column should be a 0 since the `lower` is 1 and the condition is met before `higher`'s condition. – Intern Ne Jun 23 '17 at 22:41
  • That's odd... when I run my code it works correctly and #6 has only 1 filled (as it should) but I seem to have pasted results that have them both filled somehow... Have you tried running the code or are you just looking at my (incorrect) output here on the thread? – Evan Friedland Jun 23 '17 at 22:47
  • The way my code is written it should not be possible for there to be both a higher and lower equal to 1. I must have made a typo somewhere... – Evan Friedland Jun 23 '17 at 22:49
  • I see, I was looking at the incorrect output on the thread, the outcome in the console is correct. Cheers. – Intern Ne Jun 23 '17 at 23:17
  • Happy to have helped – Evan Friedland Jun 23 '17 at 23:58
  • Are you able to help with another question of mine? Cheers. [How to convert dataframe into sparse matrix](https://stackoverflow.com/questions/44731154/how-to-convert-dataframe-into-sparse-matrix) – Intern Ne Jun 23 '17 at 23:59