-1

I'm trying to subset a data frame using a list of unique values in a column and name each subset based on that unique value in R.

I've been able to successfully subset the data frame, but I'm not sure how to name each subset based on the values they were subset with. I've linked a test data set and my code that subsets the data.

I just need to figure out how to name each subset as part of the process.

**UPDATE

I found a similar question that looked to rename subsets based on column names, but it was working through a sequence of years instead of non-sequential IDs. I'm not quite sure how to adapt it.

How to write a loop to subset data and rename the subsample

List <- unique(test$ID)

for (i in 1:length(List)) {
  assign(paste0("test",i), subset(test, ID == List[[i]]))
}
Column | ID   | Value  
1      | a    | 5  
2      | a    | 6  
3      | b    | 4  
4      | b    | 1  
5      | c    | 9  
6      | c    | 5  
7      | c    | 7  
8      | d    | 1  
9      | e    | 1  
10     | d    | 5  
11     | d    | 6  
12     | f    | 7  
13     | g    | 8  
14     | g    | 9  
15     | g    | 1  
16     | g    | 12  
17     | h    | 6

Test Data

structure(list(ID = c("a", "a", "b", "b", "c", "c", "c", "d", 
"e", "d", "d", "f", "g", "g", "g", "g", "h"), Value = c(5, 6, 
4, 1, 9, 5, 7, 1, 1, 5, 6, 7, 8, 9, 1, 12, 6)), row.names = c(NA, 
-17L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_character", 
"collector")), Value = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001b7992ef930>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))
Corey
  • 405
  • 2
  • 6
  • 18
  • 3
    Do you simply want `split(test, test$ID)` and store each individual dataframe as objects? – benson23 May 26 '23 at 05:59
  • 2
    Or `split(test, ~ID)`, `split()` will get you a named list which in most cases makes further processing a lot easier, compared to having individual dataframe objects. – margusl May 26 '23 at 06:22
  • Apart from the `split` solutions, you might also try grouping the initial dataframe by `ID` and proceed with whatever calculations you planned to apply on the subsets. Package {dplyr} is convenient for such: https://dplyr.tidyverse.org/reference/group_by.html , https://dplyr.tidyverse.org/index.html – I_O May 26 '23 at 08:31
  • The current subsetting works fine for separating and grouping. The issue is with some consistent form of naming. I end up with "test", "test1", "test2", "test3" etc. and I'd rather not have to look through each one, find out which ID it's associated with, and rename them. It's easy enough with this small set, but with hundreds of IDs that's a lot of wasted time and effort. – Corey May 30 '23 at 01:18
  • For @benson23 and margusl, split might be an option, but I'm not sure how to reintroduce the output into the overall analysis. After the main dataframe is split I need to run an additional function which requires the name of each split section, something like this: analysisx(DataframeName, variable1, variable2, variable3). – Corey Jun 05 '23 at 01:04
  • For @I_O I don't think something like group_by will work due to the following function needing dataframe names. – Corey Jun 05 '23 at 01:04

1 Answers1

0

It seems that @benson was on the right track. I first used split as shown

subset_data <- split(test, test$ID)

Then ran the following loop to rename everything

for (i in seq_along(subset_data)) {
  assign(paste0(names(subset_data)[i]), subset_data[[i]])}
Corey
  • 405
  • 2
  • 6
  • 18