I am working on a NoSQL data, which I need to pivot in R.
Sample data:
structure(list(timestamp = structure(c(1595709882, 1595709882,
1595709931, 1595709931, 1595710021, 1595710023, 1595710023, 1595710027,
1595710157, 1595710157, 1595710277, 1595710277, 1595710337, 1595710337,
1595710397, 1595710397, 1595710457, 1595710457, 1595710517, 1595710517
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), value = c("3000",
"160", "160", "3000", "6000", "6000", "160", "6000", "6000",
"160", "160", "6000", "6000", "160", "6000", "160", "6000", "160",
"6000", "160"), variable = c("ENGINE_RPM", "VEHICLE_SPEED", "VEHICLE_SPEED",
"ENGINE_RPM", "ENGINE_RPM", "ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM",
"ENGINE_RPM", "VEHICLE_SPEED", "VEHICLE_SPEED", "ENGINE_RPM",
"ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM", "VEHICLE_SPEED",
"ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM", "VEHICLE_SPEED")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
timestamp value variable
7/25/2020 20:44:42 3000 ENGINE_RPM
7/25/2020 20:44:42 160 VEHICLE_SPEED
7/25/2020 20:45:31 160 VEHICLE_SPEED
7/25/2020 20:45:31 3000 ENGINE_RPM
7/25/2020 20:47:01 6000 ENGINE_RPM
7/25/2020 20:47:03 6000 ENGINE_RPM
7/25/2020 20:47:03 160 VEHICLE_SPEED
7/25/2020 20:47:07 6000 ENGINE_RPM
7/25/2020 20:49:17 6000 ENGINE_RPM
7/25/2020 20:49:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 6000 ENGINE_RPM
7/25/2020 20:52:17 6000 ENGINE_RPM
7/25/2020 20:52:17 160 VEHICLE_SPEED
7/25/2020 20:53:17 6000 ENGINE_RPM
7/25/2020 20:53:17 160 VEHICLE_SPEED
7/25/2020 20:54:17 6000 ENGINE_RPM
7/25/2020 20:54:17 160 VEHICLE_SPEED
7/25/2020 20:55:17 6000 ENGINE_RPM
7/25/2020 20:55:17 160 VEHICLE_SPEED
If we look at the sample data, certain timestamps have both RPM and SPEED, whereas few timestamps have just one of them.
I need those rows that have 2 timestamps as they have both the vehicle speed and RPM that I can later pivot to see at a particular time, what was the vehicle's speed and its engine RPM.
The output I am looking at is:
timestamp ENGINE_RPM VEHICLE_SPEED
7/25/2020 20:44:42 3000 160
7/25/2020 20:45:31 3000 160
7/25/2020 20:47:03 6000 160
7/25/2020 20:49:17 6000 160
7/25/2020 20:51:17 6000 160
7/25/2020 20:52:17 6000 160
7/25/2020 20:53:17 6000 160
7/25/2020 20:54:17 6000 160
7/25/2020 20:55:17 6000 160
The query I used is:
data %>% group_by(timestamp, variable, value) %>%
mutate(row = row_number()) %>% filter(n() == 2) %>%
pivot_wider(names_from = variable, values_from = value) %>% select(-row)
The output I am getting is:
# A tibble: 8 x 3
# Groups: timestamp [4]
timestamp VEHICLE_SPEED ENGINE_RPM
<dttm> <chr> <chr>
1 2020-08-05 16:09:02 5 NA
2 2020-08-05 16:09:02 5 NA
3 2020-08-06 18:32:33 15 NA
4 2020-08-06 18:32:33 15 NA
5 2020-08-06 18:32:52 25 NA
6 2020-08-06 18:32:52 25 NA
7 2020-08-07 12:03:53 NA 1500
8 2020-08-07 12:03:53 NA 1500
>
Could someone let me know how to get the desired output.