7

I have a huge list (700 elements), each element being a vector of length = 16,000. I am looking for an efficient way of converting the list to a dataframe, in the following fashion (this is just a mock example):

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))

The end result I am looking for is:

 #  [,1] [,2] [,3]
 #a    1    2    3
 #b    4    5    6
 #c    7    8    9

This is what I have tried, but isn't working as I wish:

library(data.table)
result = rbindlist(Map(as.data.frame, lst))

Any suggestion? Please keep in mind that my real example has huge dimensions, and I would need a rather efficient way of doing this operation.

Thank you very much!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Mayou
  • 8,498
  • 16
  • 59
  • 98
  • you really want them stacked like that and have 16k columns but only 700 rows? – eddi Sep 11 '13 at 18:00
  • Would it be better from some standpoint to have 700 columns and 16,000 rows instead? – Mayou Sep 11 '13 at 18:01
  • most likely yes (e.g. here you could just do `as.data.frame(lst)` or `as.data.table(lst)`), but it depends of course on what you're going to do next (ime it would be extremely unusual to want that many columns) – eddi Sep 11 '13 at 18:04
  • Well, it is convenient for me to set the dataframe that way, but I guess I could do the opposite. Thanks for the tip. – Mayou Sep 11 '13 at 18:05
  • There's a *big* difference between row-ordered and column-ordered. My hoop-jumping below assumed that I wasn't allowed to change that part of the question ... oh well. – Ben Bolker Sep 11 '13 at 19:12
  • @Mariam definitely better to have more rows than columns, otherwise you spend a lot of time accessing the list elements. See [**this question**](http://stackoverflow.com/q/16219708/1478381) of mine. – Simon O'Hanlon Sep 11 '13 at 21:35
  • You can also use `ldply(k)` from `plyr` package – Metrics Sep 12 '13 at 00:08
  • 1
    @Metrics: I haven't checked, but I would guess that `ldply` is not fast ... – Ben Bolker Sep 12 '13 at 13:57

3 Answers3

18

Try this. We assume the components of L all are of the same length, n, and we also assume no row names:

L <- list(a = 1:4, b = 4:1) # test input

n <- length(L[[1]])
DF <- structure(L, row.names = c(NA, -n), class = "data.frame")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
6

I think

lst <- list(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
do.call(rbind,lst)

works. I don't know if there's a sneakier/dangerous/corner-cutting way to do it that's more efficient.

You could also try

m <- matrix(unlist(lst),byrow=TRUE,ncol=length(lst[[1]]))
rownames(m) <- names(lst)
as.data.frame(m)

... maybe it's faster?

You may not be able to do very much about speeding up the as.data.frame step. Looking at as.data.frame.matrix to see what could be stripped to make it as bare-bones as possible, it seems that the crux is probably that the columns have to be copied into their own individual list elements:

for (i in ic) value[[i]] <- as.vector(x[, i])

You could try stripping down as.data.frame.matrix to see if you can speed it up, but I'm guessing that this operation is the bottleneck. In order to get around it you have to find some faster way of mapping your data from a list of rows into a list of columns (perhaps an Rcpp solution??).

The other thing to consider is whether you really need a data frame -- if your data are of a homogeneous type, you could just keep the results as a matrix. Matrix operations on big data are a lot faster anyway ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • It does work, but it is very slow in case of large lists (which is my situation here) – Mayou Sep 11 '13 at 17:42
  • The `matrix()` takes a few minutes to complete, but I guess I could work with that in the meantime. Converting the matrix to a dataframe completely freezes the R GUI though. Thanks! – Mayou Sep 11 '13 at 17:52
3

How about just t(as.data.frame(List)) ?

> A = 1:16000
> List = list()
> for(i in 1:700) List[[i]] = A
> system.time(t(as.data.frame(List)))
   user  system elapsed 
   0.25    0.00    0.25 
Señor O
  • 17,049
  • 2
  • 45
  • 47
  • I get a very odd result when I do that.. Although my initial list didnt' have names for elements, the dataframe now has some odd column names when I do `as.data.frame(List)` – Mayou Sep 11 '13 at 18:07
  • That's because I didn't give my sample data any names. – Señor O Sep 11 '13 at 18:13