1

I was trying to answer a question on stack overflow (Mapping multiple IDs using R) when I got stuck with how to finish it. Namely, how can I test if there is a time point between a set of before and after time points.

The user from the post did not make a reproducible example but here is what I came up with. I want to test time points in hidenic_file$hidenic_time with the before and after times in dataframe emtek_file and return the emtek_id's that match the time frame of each hidenic_id. The poster didn't mention it but it seems like there is a possibility of multiple emtek_id's being returned for each hidenic_id.

library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")

entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)

hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)

##Here is where I fail to write concise and working code to find what I want. 
combined_file <- list() 
for(i in seq(hidenic_file[,1])) {
  for(j in seq(emtek_file[,1])) {
    if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 0) {next}
    if(length(zoo(1, emtek_file[j,2:3]) + zoo(1,hidenic_file[i,2])) == 1) {combined_file[[i]] < c(combinedfile[[i]],emtek_file[j,1])}
  }
  names(combined_file)[i] <- hidenic_file[i,1]
}
Community
  • 1
  • 1
cylondude
  • 1,816
  • 1
  • 22
  • 55
  • You forget `library(zoo)` and when I try to run your loop I get an error. It is easier for us , to add the expected result :combined_file ? – agstudy Jun 20 '13 at 23:06
  • whoops. It is now edited with library(zoo). I mentioned the loop is not functional but it was my best attempt to solve my problem. Can you rephrase the last sentence please? – cylondude Jun 20 '13 at 23:12
  • my understanding of "Here is where I fail to write concise", it works but not efficient:) My last sentence, I mean what is the expected result? – agstudy Jun 20 '13 at 23:13
  • With my example, I would expect to get a list where each element is a separate hidenic id with matching emtek id's in a character vector. I didn't yet add the names of each element of the list. I will edit to add that in before the loop. – cylondude Jun 20 '13 at 23:21

1 Answers1

1

I am not sure to understand all what you want to do since you don't provide the expected result. Here a solution using IRanges package. It is maybe not simple to understand at first reading but it is extremely useful to find overlaps for continuous intervals.

library(IRanges)
## create a time intervals 
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
        as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
        as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)  
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])

 hidenic_id        hidenic_time emtek_id         entry_emtek          exit_emtek
8      110387 2001-03-13 22:29:56        3 2001-03-13 22:29:56 2001-03-14 22:29:56
9      110388 2001-03-14 01:03:30        3 2001-03-13 22:29:56 2001-03-14 22:29:56
41     110420 2001-06-09 16:56:26        7 2001-06-09 16:56:26 2001-06-10 16:56:26

Ps: to install the package :

  source("http://bioconductor.org/biocLite.R")
  biocLite("IRanges")
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 1
    I guess I could have been more specific in how I wanted the data. I try first to just get a true result and then manipulate the shape to what I want later. Thanks for introducing me to IRanges! – cylondude Jun 21 '13 at 16:52
  • @cyclondude You are welcome. The shape of the result is not important , but the expected result itself , which ids in hidenic_file , and which ids in emtek_file you expect to get. – agstudy Jun 21 '13 at 16:58