2

I have multiple lists of dataframes, stored in different levels of the list hierarchy in another list. I want to "flatten" the list, so that only the lowest level of the hierarchy remain. I can't use unlist() or purrr::flatten() because this unravels the dataframes.

Is there a simple, generic way to remove the hiearchical structure, and create a list where only two levels remain (a list of lists of dataframes)?


Code example:
Generate data structure:
library(dplyr)

n <- 12
df <- lapply(1:3, function(x) {
    x <- lapply(sample.int(4,n, replace = TRUE), function(y) {
        ceiling(y*runif(100))}
    ) %>% as.data.frame()
    names(x) <- letters[1:n]
    return(x)
})

my_list <- lst()
for (n in 1:3) {
    my_list$a[[n]] <- df[[n]][,1:3]
}
for (n in 1:3) {
    my_list$b$c[[n]] <- df[[n]][,4:6]
}
for (n in 1:3) {
    my_list$a$b$d$e[[n]] <- df[[n]][,7:9]
}

my_list %>% str()
Working code for what I want:
lst(
    a = my_list$a[1:3],
    b = my_list$a$b$d$e,
    c = my_list$b$c
    
) %>% str()

Outputs:
Multilevel hierarchical structure:
List of 2
 $ a:List of 4
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 1 1 1 1 1 2 2 2 1 ...
  .. ..$ b: num [1:100] 1 1 1 2 2 1 2 2 2 2 ...
  .. ..$ c: num [1:100] 2 1 1 2 1 1 1 2 1 2 ...
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 2 1 1 2 1 3 3 1 3 ...
  .. ..$ b: num [1:100] 1 1 3 2 3 1 3 3 3 3 ...
  .. ..$ c: num [1:100] 1 2 2 1 3 2 4 3 3 1 ...
  ..$  :'data.frame':   100 obs. of  3 variables:
  .. ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ c: num [1:100] 2 2 1 1 1 1 1 1 1 2 ...
  ..$ b:List of 1
  .. ..$ d:List of 1
  .. .. ..$ e:List of 3
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 3 3 1 3 1 1 1 3 1 2 ...
  .. .. .. .. ..$ h: num [1:100] 1 1 2 1 1 1 1 2 1 1 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 2 2 2 1 1 2 2 1 ...
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. ..$ h: num [1:100] 2 4 4 4 3 3 3 2 4 4 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ :'data.frame':   100 obs. of  3 variables:
  .. .. .. .. ..$ g: num [1:100] 2 1 3 2 3 1 1 2 1 2 ...
  .. .. .. .. ..$ h: num [1:100] 1 2 1 2 1 1 1 1 1 2 ...
  .. .. .. .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
 $ b:List of 1
  ..$ c:List of 3
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 2 2 2 1 1 1 2 1 1 1 ...
  .. .. ..$ e: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ f: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 1 2 2 2 1 2 2 2 1 1 ...
  .. .. ..$ e: num [1:100] 1 2 2 1 2 1 1 1 2 2 ...
  .. .. ..$ f: num [1:100] 2 2 1 1 1 2 2 1 1 1 ...
  .. ..$ :'data.frame': 100 obs. of  3 variables:
  .. .. ..$ d: num [1:100] 2 3 3 1 3 4 4 4 1 3 ...
  .. .. ..$ e: num [1:100] 1 2 2 1 1 1 3 2 3 3 ...
  .. .. ..$ f: num [1:100] 3 3 3 3 1 2 2 2 3 1 ...
The desired output, a two-level list structure:
List of 3
 $ a:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 1 1 1 1 1 2 2 2 1 ...
  .. ..$ b: num [1:100] 1 1 1 2 2 1 2 2 2 2 ...
  .. ..$ c: num [1:100] 2 1 1 2 1 1 1 2 1 2 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 2 2 1 1 2 1 3 3 1 3 ...
  .. ..$ b: num [1:100] 1 1 3 2 3 1 3 3 3 3 ...
  .. ..$ c: num [1:100] 1 2 2 1 3 2 4 3 3 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ c: num [1:100] 2 2 1 1 1 1 1 1 1 2 ...
 $ b:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 3 3 1 3 1 1 1 3 1 2 ...
  .. ..$ h: num [1:100] 1 1 2 1 1 1 1 2 1 1 ...
  .. ..$ i: num [1:100] 1 1 2 2 2 1 1 2 2 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ h: num [1:100] 2 4 4 4 3 3 3 2 4 4 ...
  .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ g: num [1:100] 2 1 3 2 3 1 1 2 1 2 ...
  .. ..$ h: num [1:100] 1 2 1 2 1 1 1 1 1 2 ...
  .. ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
 $ c:List of 3
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 2 2 2 1 1 1 2 1 1 1 ...
  .. ..$ e: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ f: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 1 2 2 2 1 2 2 2 1 1 ...
  .. ..$ e: num [1:100] 1 2 2 1 2 1 1 1 2 2 ...
  .. ..$ f: num [1:100] 2 2 1 1 1 2 2 1 1 1 ...
  ..$ :'data.frame':    100 obs. of  3 variables:
  .. ..$ d: num [1:100] 2 3 3 1 3 4 4 4 1 3 ...
  .. ..$ e: num [1:100] 1 2 2 1 1 1 3 2 3 3 ...
  .. ..$ f: num [1:100] 3 3 3 3 1 2 2 2 3 1 ...
Pål Bjartan
  • 793
  • 1
  • 6
  • 18
  • Will the list only consist of data frames at diffirent levels and or will it contain other information that we'll need to avoid when extracting? – harre Jun 16 '22 at 10:32
  • No, my particular problem has only data frames. However, I am looking for a generic way to create lists from the last lists in each branch. Data frames are also useful for this purpose, because it demonstrates an issue with `unlist()` and `flatten()`. Since dataframes are also types of lists, they are treated as such by these functions. – Pål Bjartan Jun 16 '22 at 11:08
  • Understood. BUT: Because a1, a2, a3, b are on the same level list-wise, they can never by list-logic be flattened into the solution you want. The most elegant solution applying list logic is in my opinion this one: https://stackoverflow.com/questions/19734412/flatten-nested-list-into-1-deep-list, using `if (class(l) == 'list')) lapply(l, renquote)` in the comments to distinguish between lists and dataframes. If you still want what you want the only possible solution is to flatten it completely and split by names (as @rawr suggest). – harre Jun 16 '22 at 12:19

2 Answers2

1

One option would be to flatten the list into a list of data frames and then split it into a list of a list of data frames

flatten <- function(x) {
  while (any(vapply(x, inherits, logical(1L), 'list'))) {
    x <- lapply(x, function(xx)
      if (inherits(xx, 'list'))
        xx else list(xx))
    x <- unlist(x, recursive = FALSE)
  }
  x
}

fl <- flatten(my_list)
str(split(fl, gsub('\\d+$', '', names(fl))))
rawr
  • 20,481
  • 4
  • 44
  • 78
0

I don't know how to achieve this using standard flattening functions but designing an algorithm that can do it is pretty straightforward. You just go through the structure of nested lists and keep only those that have no other list as a child.

find_last_lists <- function(lst, parent.names=NULL) {
  
  # return 'lst' if it has no items that are lists
  if (!any(sapply(lst, is.list))) {
    
    setNames(list(lst), 
             parent.names[[length(parent.names)-1]])
    
  # otherwise go through all items recursively 
  } else {
    
    df.list <- NULL
    for (i in seq_along(lst)) {
      
      df.list <- c(df.list, 
                   find_last_lists(lst[[i]], 
                                   c(parent.names, list(names(lst)[i]))))
    }
    
    df.list
  }
}

It is basically a depth-first traversal of a tree for which I used a recursive function (a non-recursive solution would be possible as well). parent.names stores the sequence of names of parent list items.

fl <- find_last_lists(my_list)
# List of 9
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 3 2 2 2 1 3 1 3 1 2 ...
#   ..$ b: num [1:100] 3 3 1 2 2 1 3 3 2 2 ...
#   ..$ c: num [1:100] 2 1 1 2 4 1 2 3 3 3 ...
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 1 1 1 1 1 1 1 2 2 2 ...
#   ..$ b: num [1:100] 2 4 4 2 1 1 2 3 3 4 ...
#   ..$ c: num [1:100] 1 1 3 3 2 1 2 3 1 3 ...
# $ a:'data.frame': 100 obs. of  3 variables:
#   ..$ a: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ b: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ c: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ h: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ i: num [1:100] 1 2 2 1 1 1 1 1 2 2 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ h: num [1:100] 1 2 2 1 1 2 2 2 1 1 ...
#   ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ e:'data.frame': 100 obs. of  3 variables:
#   ..$ g: num [1:100] 1 1 2 1 2 2 3 3 3 2 ...
#   ..$ h: num [1:100] 2 1 1 1 2 2 1 2 2 2 ...
#   ..$ i: num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 2 1 3 3 3 4 4 4 3 3 ...
#   ..$ e: num [1:100] 2 3 3 3 3 3 3 2 3 3 ...
#   ..$ f: num [1:100] 4 1 1 1 2 1 2 4 4 3 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 4 1 1 3 4 4 4 4 4 2 ...
#   ..$ e: num [1:100] 4 3 4 2 4 4 2 4 2 4 ...
#   ..$ f: num [1:100] 3 1 2 2 2 1 3 3 2 3 ...
# $ c:'data.frame': 100 obs. of  3 variables:
#   ..$ d: num [1:100] 1 1 4 3 3 1 1 2 2 1 ...
#   ..$ e: num [1:100] 2 1 1 3 3 1 1 1 1 3 ...
#   ..$ f: num [1:100] 1 3 2 2 4 4 1 3 3 2 ...

The result is a list of data frames which can be further grouped and reordered into your desired format as follows:

fl <- tapply(fl, names(fl), unname)
fl <- fl[order(names(fl))]
Robert Hacken
  • 3,878
  • 1
  • 13
  • 15