2

I'm wondering if there is an easy way to restructure some data I have. I currently have a data frame that looks like this...

Year    Cat   Number
2001    A     15
2001    B     2
2002    A     4
2002    B     12

But what I ultimately want is to have it in this shape...

Year    Cat    Number    Cat    Number
2001    A      15        B      2
2002    A      4         B      12

Is there a simple way to do this?

Thanks in advance

:)

CodeLearner
  • 389
  • 2
  • 6
  • 14
  • Different format, but how about `reshape(df, idvar = "Year", timevar = "Cat", direction = "wide")` – Rich Scriven Oct 19 '14 at 16:58
  • 2
    Not sure there is any point in having a column that is all "A" and another column that is all "B". How about instead: `xtabs(Number ~., DF)` – G. Grothendieck Oct 19 '14 at 17:01
  • Well, "anything is possible" :-), but just out of pure curiosity, why do you want to do this? – shekeine Oct 19 '14 at 17:07
  • I want a column of only A and a column of only B because I then want to add other columns of A% and B%. So there only are two categories, A and B. I'm a complete newbie at R so really just trying to solve the problem with the limited knowledge I have. – CodeLearner Oct 19 '14 at 17:10

1 Answers1

2

One way would be to use dcast/melt from reshape2. In the below code, first I created a sequence of numbers (indx column) for each Year by using transform and ave. Then, melt the transformed dataset keeping id.var as Year, and indx. The long format dataset is then reshaped to wide format using dcast. If you don't need the suffix _number, you can use gsub to remove that part.

library(reshape2)
res <- dcast(melt(transform(df, indx=ave(seq_along(Year), Year, FUN=seq_along)),
        id.var=c("Year", "indx")), Year~variable+indx, value.var="value")
colnames(res) <- gsub("\\_.*", "", colnames(res))
res
#   Year Cat Cat Number Number
#1 2001  A     B   15      2
#2 2002  A     B   4      12

Or using dplyr/tidyr. Here, the idea is similar as above. After grouping by Year column, generate a indx column using mutate, then reshape to long format with gather, unite two columns to a single column VarIndx and then reshape back to wide format with spread. In the last step mutate_each, columns with names that start with Number are converted to numeric column.

library(dplyr)
library(tidyr)

res1 <-  df %>% 
             group_by(Year) %>%
             mutate(indx=row_number()) %>% 
             gather("Var", "Val", Cat:Number) %>%
             unite(VarIndx, Var, indx) %>%
             spread(VarIndx, Val) %>%
             mutate_each(funs(as.numeric), starts_with("Number")) 
 
 res1
 #  Source: local data frame [2 x 5]

  #  Year Cat_1 Cat_2 Number_1 Number_2
  #1 2001     A     B       15        2
  #2 2002     A     B        4       12
        

Or you can create an indx variable .id using getanID from splitstackshape (from comments made by @Ananda Mahto (author of splitstackshape) and use reshape from base R

  library(splitstackshape)
  reshape(getanID(df, "Year"), direction="wide", idvar="Year", timevar=".id")
  #   Year Cat.1 Number.1 Cat.2 Number.2
  #1: 2001     A       15     B        2
  #2: 2002     A        4     B       12

data

df <-   structure(list(Year = c(2001L, 2001L, 2002L, 2002L), Cat = c("A", 
"B", "A", "B"), Number = c(15L, 2L, 4L, 12L)), .Names = c("Year", 
 "Cat", "Number"), class = "data.frame", row.names = c(NA, -4L
 ))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    `reshape(getanID(df, "Year"), direction = "wide", idvar = "Year", timevar = ".id")` (where `getanID` is from "splitstackshape") :-) – A5C1D2H2I1M1N2O1R2T1 Oct 20 '14 at 02:20
  • @Ananda Mahto Thanks for the comment. It looks more compact now. – akrun Oct 20 '14 at 03:34
  • @akrun I have been going through the CRAN manual of splitstackshape. When I ran the sample code in the manual, I did not see an output. The code was running without errors. The same happen with this answer. I mean, if I just run `getanID(df, "Year")`, there is no output. So I guess I need to use getanID in another function. Is that right? I have a similar issue with `listCol_l` and `cSplit_f`. Codes are running, but no output. Please let me know if you have any insight. – jazzurro Oct 30 '14 at 02:42
  • @jazzurro I guess the version matters. Yesterday, I had the same aha moment when I tried to replicate http://stackoverflow.com/questions/26637864/splitting-a-column-delimiter-r/26638028#26638028 Later I found that it was only available in the new version. So, please check if you have the latest version. I installed the github version today and it runs on both the cases fine. – akrun Oct 30 '14 at 02:48
  • @akrun I have splitstackshape_1.4.2 which I got from CRAN. I will go to github and get that version now. Hmm no luck. I still see the same behaviour. When `getanID` is wrapped in another function like this example, I have no problem. – jazzurro Oct 30 '14 at 02:51