Wide to long format in R list: 3D to 2D array with 3rd dimension as ID

Question

I have imported a *.mat data set of ECG data and it turns out to be an array nested in a list with (1:19, 1:2000, 1:45) dimensions.

I'd like to convert this array into a data.table in long format where each of the 1:45 are 'ids'. I like the look of reshape2 and tidyr but I don't see an easy way of doing it when a 'list' is involved. Any thoughts?

ADDED: E.g. as the following picture:

EDIT: Added dput from ECGa

    dput(ECGa[1:4,1:4,1:4])
structure(c(0.266687798848186, 0.243782451327742, 0.256932437720159, 
0.298861598151174, 0.198233672667731, 0.0917952258522064, 0.0911852809187542, 
0.0896079263551856, 0.236398290801764, 0.0864552727199747, 0.0745517747485495, 
0.141094205953345, 0.134887167694073, 0.0747942533151883, 0.0955856952160322, 
0.0351423350784724, 0.0280172116375036, 0.0137183766752048, 0.00632054977574689, 
0.0140727955279187, 0.0690137281047283, 0.078048395374513, 0.103558903741209, 
0.0440585188615387, 0.156352265056089, 0.112594108595364, 0.162727838219577, 
0.171253189308951, 0.10110879614821, 0.0815894300030362, 0.11782535820017, 
0.0422632188213653, 0.0555849641766514, 0.0677027788598739, 0.0459698146330784, 
0.0388415858274208, 0.0843241755529416, 0.0607574029475139, 0.0572549162201976, 
0.0507991887467287, 0.0505785290171543, 0.064132492222132, 0.0527843866043094, 
0.0354988312446934, 0.104654374350645, 0.0881949907935882, 0.0429712078085868, 
0.0576943626267035, 0.0382280461459995, 0.124883693856915, 0.0481763535955804, 
0.0397818749456581, 0.0782161984603273, 0.155594086108477, 0.121039425233015, 
0.0563997196467123, 0.0513952066155024, 0.209997229543773, 0.0745673273804948, 
0.0647872565452434, 0.0801540099609934, 0.147046389860838, 0.162708859129276, 
0.0766361733056703), .Dim = c(4L, 4L, 4L), .Dimnames = list(NULL, 
    c("P7", "P4", "Cz", "Pz"), NULL))

I've tried doing: ECGa<-as.data.frame(ECGa) gives the right dimensions but it renames all the columns (e.g. the first becomes P7.1, P7.2 ... P7.45) I want to make a new column called ID that gives a value of 1 for the first patient and 2 for the second, up to 45 for the forty fifth.

NEW ADDITION: I've found that using abind does part of the job I want. But imagine I had a 1000 arrays, can I automate it? e.g.

 abind(ECGa[,,1],ECGa[,,2],ECGa[,,3],ECGa[,,4],ECGa[,,5],along=1)
> dim(abind(ECGa[,,1],ECGa[,,2],ECGa[,,3],ECGa[,,4],ECGa[,,5],along=1))
[1] 10000    19

Please add a small [reproducible example](https://stackoverflow.com/q/5963269/1217536) for people to work with. — gung - Reinstate Monica, Jun 02 '17 at 16:36
Can't you just access the elements of the list? ```vec1 = my_list[[1]]``` , etc? — rsmith54, Jun 02 '17 at 16:40
A `list` doesn't have dimensions. Dimensions apply to `data.frame`, `matrix`, and likely what you have, an `array` (a `matrix` is a special-case `array`). Either you don't have a list or you have an array nested in a list. Bottom line, if all you need to do is rearrange the dimensions, the package [`abind`](https://cran.r-project.org/web/packages/abind/index.html) should help. If that isn't what you need, please do as @gung suggested and provide a small reproducible example using a portion of your data, perhaps with `dput(x[1:10,1:10,1:10])`. — r2evans, Jun 02 '17 at 16:57
@r2evans Thanks for this. I've managed to narrow the problem down to wanting to concatenate the values in the 3rd dimension into a 2d array by adding an identifier for each sheet/layer ( is that the term?) as per the picture above. Does it make sense? — HCAI, Jun 03 '17 at 18:11
@BenBolker But I don't want to permutate the dimensions, I want to colkapse the whole thing into a 90000 x 19 data.frame. Where 90000 is 2000 rows x 45 patients. Then add an ID for each patient. — HCAI, Jun 03 '17 at 18:52
the reason to permute would be to get the elements in the correct order so that when you collapse to a vector/redimension it to the desired matrix dimensions you get the right answer. — Ben Bolker, Jun 03 '17 at 21:01
but now that I look more carefully, I see it's not necessary. — Ben Bolker, Jun 03 '17 at 21:08

score 2 · Answer 1 · answered Jun 03 '17 at 21:08

2

Something like

dims <- dim(dd)
dd2 <- matrix(dd,nrow=prod(dims[2:3]),ncol=dims[1])
dd3 <- data.frame(ID=rep(1:dims[3],each=dims[2]),
                  dd2)
rownames(dd3) <- c("ID",dimnames(dd)[[2]])

should work, I think.

answered Jun 03 '17 at 21:08

Ben Bolker

211,554
25
370
453

Thank you for taking the time to look at this. Unfortunately this code complains about a length of row.names so I'm going to go with r2evans' answer as it works "as is". I'm still finding R's syntax difficult, especially with nested arguments. FORTRAN77 is how I think. Have you picked up your knowledge on the go or do you recommend a particular set of training materials? Cheers, – HCAI Jun 04 '17 at 11:51

score 1 · Accepted Answer · answered Jun 03 '17 at 20:35

I think you can do without abind, perhaps as simple as:

Reduce(rbind, sapply(1:dim(df)[3], function(i) {
  x <- data.frame(df[,,i])
  x$id <- i
  x
}, simplify = FALSE))
#            P7         P4         Cz         Pz id
# 1  0.26668780 0.19823367 0.23639829 0.13488717  1
# 2  0.24378245 0.09179523 0.08645527 0.07479425  1
# 3  0.25693244 0.09118528 0.07455177 0.09558570  1
# 4  0.29886160 0.08960793 0.14109421 0.03514234  1
# 5  0.02801721 0.06901373 0.15635227 0.10110880  2
# 6  0.01371838 0.07804840 0.11259411 0.08158943  2
# 7  0.00632055 0.10355890 0.16272784 0.11782536  2
# 8  0.01407280 0.04405852 0.17125319 0.04226322  2
# 9  0.05558496 0.08432418 0.05057853 0.10465437  3
# 10 0.06770278 0.06075740 0.06413249 0.08819499  3
# 11 0.04596981 0.05725492 0.05278439 0.04297121  3
# 12 0.03884159 0.05079919 0.03549883 0.05769436  3
# 13 0.03822805 0.07821620 0.05139521 0.08015401  4
# 14 0.12488369 0.15559409 0.20999723 0.14704639  4
# 15 0.04817635 0.12103943 0.07456733 0.16270886  4
# 16 0.03978187 0.05639972 0.06478726 0.07663617  4

If by chance your third dimension actually has names (faked with your data using dimnames(df)[[3]] <- paste("id", 1:dim(df)[3], sep = "")), then you can do:

head(
  Reduce(rbind, sapply(dimnames(df)[[3]], function(nm) {
    x <- data.frame(df[,,nm])
    x$id <- nm
    x
  }, simplify = FALSE))
)
#            P7         P4         Cz         Pz  id
# 1  0.26668780 0.19823367 0.23639829 0.13488717 id1
# 2  0.24378245 0.09179523 0.08645527 0.07479425 id1
# 3  0.25693244 0.09118528 0.07455177 0.09558570 id1
# 4  0.29886160 0.08960793 0.14109421 0.03514234 id1
# 5  0.02801721 0.06901373 0.15635227 0.10110880 id2
# 6  0.01371838 0.07804840 0.11259411 0.08158943 id2

Thank you very much for taking the time to help me. You answer works 'out of the box', so I'd like to accept it. How did you know how to do this though? I mean conceptually. Did you have to look up something or did you just "know"? I'd really like to become more proficient at data manipulation. — HCAI, Jun 04 '17 at 11:44
Knitting how to do this was internal (no lookup required), though I almost always test it before answering to make sure I didn't "forget a comma" or something similar. I have worked with R's `matrix`, `array`, and `data.frame` classes enough to just "know" the basic manipulation, and then some experience with the concepts of map/reduce processing (from lisp, for me) finished it. I don't think those apply as well in Fortran, so it just takes repetition. Browsing SO helps, though my CS training helped a lot more. — r2evans, Jun 04 '17 at 13:14

Wide to long format in R list: 3D to 2D array with 3rd dimension as ID

2 Answers2