1

I need to load a lot of files into R, all of these files have a date hidden in their name as follows: SD07_TWK_20190822_003004

I would like to select the files that are loaded based on these dates.

I load the files into R like this:

filenames = list.files(path=path, pattern=".txt") 
colnamesfull = c("time","v","a","t1","t2","t3","t4","t5","t6","t7","t8")

for(i in filenames){
  filepath = file.path(path, paste(i, sep=""))
  assign(i, read.table(filepath,
                       skip= 20, 
                       col.names= colnamesfull, 
                       sep=",")
                       )}

To filter on dates, I assume I need to add a date range in the 'pattern' within the list.files function. However, I can't get that to work.

Say I have the following dates:

date_start = "20190822"
date_end = "20190823"

How would I add a filter for these dates into the code above?

Example of code and files:

#path = "C:/path"
filenames = list.files(path=path, pattern=".txt")
names = substr(filenames,10,17) 
date_start = "20190822"
date_end = "20190822"

for(i in filenames){
  filepath = file.path(path, paste(i, sep=""))
  if( (date_start <= substr(filepath, 10, 17))  &  
      (substr(filepath, 10, 17) <= date_end  )){
    assign(i, read.table(filepath,
                         skip= 20, 
                         col.names= colnamesfull,
                         sep=","))}}

Some files:

> dput(SD07_TWK_20190822_003004.txt[1:10,])
structure(list(time = c(2, 3.9, 5.8, 7.8, 9.7, 11.7, 13.6, 15.5, 
17.5, 19.4), v = c(14.82, 14.804, 14.82, 14.82, 14.804, 14.82, 
14.812, 14.804, 14.8, 14.808), a = c(1.5, 1.476, 1.5, 1.491, 
1.452, 1.476, 1.478, 1.44, 1.454, 1.438), t1 = c(14.61, 14.61, 
14.61, 14.61, 14.61, 14.61, 14.61, 14.62, 14.62, 14.63), t2 = c(14.63, 
14.62, 14.62, 14.62, 14.62, 14.62, 14.62, 14.63, 14.63, 14.64
), t3 = c(14.63, 14.63, 14.63, 14.63, 14.63, 14.63, 14.63, 14.63, 
14.64, 14.65), t4 = c(14.65, 14.65, 14.65, 14.65, 14.64, 14.64, 
14.65, 14.65, 14.66, 14.67), t5 = c(14.65, 14.65, 14.65, 14.65, 
14.65, 14.65, 14.66, 14.66, 14.67, 14.69), t6 = c(14.63, 14.63, 
14.63, 14.63, 14.63, 14.63, 14.63, 14.64, 14.65, 14.66), t7 = c(14.64, 
14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 14.65, 14.66
), t8 = c(14.6, 14.6, 14.6, 14.6, 14.6, 14.6, 14.61, 14.61, 14.62, 
14.63)), row.names = c(NA, 10L), class = "data.frame")
> dput(SD07_TWK_20190823_225940.txt[1:10,])
structure(list(time = c(2, 3.9, 5.8, 7.8, 9.7, 11.7, 13.6, 15.6, 
17.5, 19.5), v = c(14.436, 14.428, 14.436, 14.428, 14.432, 14.424, 
14.428, 14.42, 14.424, 14.42), a = c(1.494, 1.507, 1.499, 1.494, 
1.49, 1.51, 1.495, 1.51, 1.511, 1.516), t1 = c(14.63, 14.63, 
14.63, 14.63, 14.63, 14.63, 14.63, 14.63, 14.64, 14.65), t2 = c(14.61, 
14.61, 14.61, 14.61, 14.61, 14.61, 14.61, 14.61, 14.61, 14.62
), t3 = c(14.64, 14.64, 14.64, 14.64, 14.64, 14.63, 14.64, 14.64, 
14.64, 14.65), t4 = c(14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 
14.64, 14.65, 14.65, 14.66), t5 = c(14.67, 14.68, 14.67, 14.67, 
14.67, 14.68, 14.68, 14.68, 14.69, 14.7), t6 = c(14.67, 14.67, 
14.67, 14.67, 14.67, 14.67, 14.67, 14.67, 14.68, 14.69), t7 = c(14.67, 
14.67, 14.67, 14.67, 14.67, 14.67, 14.67, 14.67, 14.68, 14.69
), t8 = c(14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 14.64, 
14.65, 14.66)), row.names = c(NA, 10L), class = "data.frame")

1 Answers1

1

Starting with a filename following the pattern of "SD07_TWK_20190822_003004" you can extract the date as characters 10 through 17:

> substr("SD07_TWK_20190822_003004", 10, 17)
[1] "20190822"

Now this is a character string. But as someone clever ordered the year, month and date in a sensible order, you can compare these strings as if they were dates:

example <- substr("SD07_TWK_20190822_003004", 10, 17)
date_start = "20190822"
date_end = "20190823"

example >= date_start
example < date_end

Now if you want to see, whether a date is in between date_start and date_end this becomes something like

> date_start <= example & example <= date_end
[1] TRUE

Let's try it with a date a month early and a year late to see if we get the desired FALSE:

> example = "20190722"
> date_start <= example & example <= date_end
[1] FALSE
> example = "20200822"
> date_start <= example & example <= date_end
[1] FALSE

Great, now all you have to do is condition the assign in your loop with an if statement, something along the lines of

if( (start_date <= substr(filepath, 10, 17))  &  
    (substr(filepath, 10, 17) <= end_date  ) ){
        assign(....)
}

I cannot close without giving the advice to reconsider, whether assign is a good choice here. You should probably rather store those data.frames in a list, but that is not the topic of this question.

Bernhard
  • 4,272
  • 1
  • 13
  • 23
  • Hey Bernard, Thank you so much for the help! It works all the way up until the end, in the if-loop I get the following error: ```Error in if ((date_start <= substr(filenames, 10, 17)) & (substr(filenames, : the condition has length > 1``` – Djingleberg Aug 18 '22 at 09:17
  • Also, You are totally right about storing the dataframes in a list. That is a step I do afterwards and I would like to skip this loading step if possible. If you would like to help with that as well that would be great, But I will create another question :) – Djingleberg Aug 18 '22 at 09:18
  • 1
    For the Error message: It is really unnecessarily hard to do debugging without a reproducible example, i.e., code and sample data to reproduce the error. You might come by just switching the `&` by a `&&` but obviously you will want to explore what those parenthesis evaluate to and understand, where the problem comes from. It that does not work, you will have to go through the process of building a reproducible example. – Bernhard Aug 18 '22 at 09:48
  • As to building lists in a `for` loop there are probably lots of answers to be found by searching this site. One good option is to build it incrementally using the `append` command as I have demonstrated in this answer: https://stackoverflow.com/a/73394451/6503141 Please consider upvoting there if it spares you the trouble of formulating a new question. – Bernhard Aug 18 '22 at 09:51
  • Hey Bernhard, Thanks again for helping me out. I will look at your suggestion for the forloop listbuilding later. I am still puzzling with the IF statement. It works now, but still pulls in the files from outside of the range as well. I have tried to add a reproducible example in the main question If you would still like to try it. With the given example the code runs, but gives the following warning (more than 50 times): ``` 50: In date_start <= names && names <= date_end : 'length(x) = 96 > 1' in coercion to 'logical(1)' ``` Coincidentally, 96 is the total number of files. – Djingleberg Aug 18 '22 at 10:16
  • 1
    My proposal was to put an `if` statement within the loop conditional on `filepath`. `filepath` has a different value each time the loop runs. Your code conditions on `names`, a vector of date strings that was determined outside the loop. So as `names` never changes from loop run to loop run the condition will never change. It will either take in all or no filepaths. – Bernhard Aug 18 '22 at 13:27
  • Ah, I understand that now. Thanks for explaining. I changed this in the main question again, It runs smoothly now but doesn't really do anyhting. Maybe you see something wrong with it that I don't. – Djingleberg Aug 18 '22 at 14:26
  • Again, usually a reproducible example would be best but even without it I do not think that setting `date_start` and `date_end` to identical values is generally a good idea. Unless there are filenames for that one specific day, which I could check in a reproducible example. Also adding a path in front of the filename could shift the date from letter 10 to 17 to higher numbers. – Bernhard Aug 18 '22 at 14:55
  • Okay, I have added two files from two different dates. All of the code is there. What else do you need for a reproducible example? It should still work if you want the files from one day right? I don't see how that would mess up the loop. – Djingleberg Aug 19 '22 at 11:09