Subset data frame in R and name each subset in the loop

Question

I'm trying to subset a data frame using a list of unique values in a column and name each subset based on that unique value in R.

I've been able to successfully subset the data frame, but I'm not sure how to name each subset based on the values they were subset with. I've linked a test data set and my code that subsets the data.

I just need to figure out how to name each subset as part of the process.

**UPDATE

I found a similar question that looked to rename subsets based on column names, but it was working through a sequence of years instead of non-sequential IDs. I'm not quite sure how to adapt it.

How to write a loop to subset data and rename the subsample

List <- unique(test$ID)

for (i in 1:length(List)) {
  assign(paste0("test",i), subset(test, ID == List[[i]]))
}

Column | ID   | Value  
1      | a    | 5  
2      | a    | 6  
3      | b    | 4  
4      | b    | 1  
5      | c    | 9  
6      | c    | 5  
7      | c    | 7  
8      | d    | 1  
9      | e    | 1  
10     | d    | 5  
11     | d    | 6  
12     | f    | 7  
13     | g    | 8  
14     | g    | 9  
15     | g    | 1  
16     | g    | 12  
17     | h    | 6

Test Data

structure(list(ID = c("a", "a", "b", "b", "c", "c", "c", "d", 
"e", "d", "d", "f", "g", "g", "g", "g", "h"), Value = c(5, 6, 
4, 1, 9, 5, 7, 1, 1, 5, 6, 7, 8, 9, 1, 12, 6)), row.names = c(NA, 
-17L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_character", 
"collector")), Value = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001b7992ef930>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

Do you simply want `split(test, test$ID)` and store each individual dataframe as objects? — benson23, May 26 '23 at 05:59
Or `split(test, ~ID)`, `split()` will get you a named list which in most cases makes further processing a lot easier, compared to having individual dataframe objects. — margusl, May 26 '23 at 06:22
Apart from the `split` solutions, you might also try grouping the initial dataframe by `ID` and proceed with whatever calculations you planned to apply on the subsets. Package {dplyr} is convenient for such: https://dplyr.tidyverse.org/reference/group_by.html , https://dplyr.tidyverse.org/index.html — I_O, May 26 '23 at 08:31
The current subsetting works fine for separating and grouping. The issue is with some consistent form of naming. I end up with "test", "test1", "test2", "test3" etc. and I'd rather not have to look through each one, find out which ID it's associated with, and rename them. It's easy enough with this small set, but with hundreds of IDs that's a lot of wasted time and effort. — Corey, May 30 '23 at 01:18
For @benson23 and margusl, split might be an option, but I'm not sure how to reintroduce the output into the overall analysis. After the main dataframe is split I need to run an additional function which requires the name of each split section, something like this: analysisx(DataframeName, variable1, variable2, variable3). — Corey, Jun 05 '23 at 01:04
For @I_O I don't think something like group_by will work due to the following function needing dataframe names. — Corey, Jun 05 '23 at 01:04

score 0 · Accepted Answer · answered Jun 07 '23 at 05:22

It seems that @benson was on the right track. I first used split as shown

subset_data <- split(test, test$ID)

Then ran the following loop to rename everything

for (i in seq_along(subset_data)) {
  assign(paste0(names(subset_data)[i]), subset_data[[i]])}

Subset data frame in R and name each subset in the loop

1 Answers1