How to Create a Column of Ranks While Grouping in R

Question

I am using R and I want to create a column showing a sequence or rank, while grouping by two factors (hhid and period).

For example, I have this data set:

I want to add a column called "actno" like this:

hhid perid actno
1000 1     1
1000 1     2
1000 1     3
1000 2     1
1000 2     2
2000 1     1
2000 1     2
2000 1     3
2000 1     4
2000 2     1
2000 2     2

score 4 · Answer 1 · answered Sep 12 '12 at 00:24

4

If you have lots of groups or large data, data.table is the way to go for efficiency of time and memory

# assuming your data is in a data.frame called DF
library(data.table)
DT <- data.table(DF)


DT[, ActNo := seq_len(.N), by = list(hhid,perid)]

note that .N gives the number of rows in the subset by grouping (see ?data.table for more details)

answered Sep 12 '12 at 00:24

mnel

113,303
27
265
254

Is there a quick way to deal with ties in data.table? – gannawag Oct 05 '16 at 16:19

score 3 · Accepted Answer · answered Sep 11 '12 at 21:43

No need for plyr. Just use ave and seq:

> dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=seq))
> dat
   hhid perid actno
1  1000     1     1
2  1000     1     2
3  1000     1     3
4  1000     2     1
5  1000     2     2
6  2000     1     1
7  2000     1     2
8  2000     1     3
9  2000     1     4
10 2000     2     1
11 2000     2     2

The first argument in this instance could be either column or you could do it with the slightly less elegant bu perhaps more clear:

dat$actno <- with( dat, ave(hhid, hhid, perid, FUN=function(x) seq(length(x) ) ) )

score 2 · Answer 3 · answered Sep 11 '12 at 21:14

2

if your data is called urdat then without plyr you can do:

df <- urdat[order(urdat$hhid, urdat$perid),]
df$actno <- sequence(rle(df$perid)$lengths)

answered Sep 11 '12 at 21:14

user1317221_G

15,087
3
52
78

score 1 · Answer 4 · answered Sep 11 '12 at 20:34

1

the plyr package can do this nicely:

library(plyr)
dat <- structure(list(hhid = c(1000L, 1000L, 1000L, 1000L, 1000L, 2000L, 
2000L, 2000L, 2000L, 2000L, 2000L), perid = c(1L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("hhid", "perid"), class = "data.frame", row.names = c(NA, 
-11L))

ddply(dat, .(hhid, perid), transform, actno=seq_along(perid))

   hhid perid actno
1  1000     1     1
2  1000     1     2
3  1000     1     3
4  1000     2     1
5  1000     2     2
6  2000     1     1
7  2000     1     2
8  2000     1     3
9  2000     1     4
10 2000     2     1
11 2000     2     2

answered Sep 11 '12 at 20:34

Justin

42,475
9
93
111

Thank you very much, Justin... It works with my data set, but because of a huge number of groups, it took long time and my computer significantly slowed down after running your code. Do you have any suggestions? – POTENZA Sep 11 '12 at 21:43
@user1663986 `plyr` is a nice way to explore data so long as it is small. Either of the other answers, particularly DWin's will be very fast and work well on large data. – Justin Sep 11 '12 at 21:48
@user1663986 And how did you get on with mnel's answer? – Matt Dowle Oct 07 '12 at 08:32

score -4 · Answer 5 · answered Oct 06 '12 at 21:04

Pseudocode:

For each unique value of `hhid` `h`
    For each unique value of `perid` `p`
        counter = 0;
        For each row of table where `hhid==h && perid==p`
            counter++;
            Assign counter to `actno` of this column

Should be trivial to implement, especially with a data frame.

How to Create a Column of Ranks While Grouping in R

5 Answers5

Linked