How to rank rows by two columns at once in R?

Question

Here is the code to rank based on column v2:

x <- data.frame(v1 = c(2,1,1,2), v2 = c(1,1,3,2))
x$rank1 <- rank(x$v2, ties.method='first')

But I really want to rank based on both v2 and/then v1 since there are ties in v2. How can I do that without using RPostgreSQL?

score 3 · Accepted Answer · answered Dec 02 '14 at 22:38

3

How about:

within(x, rank2 <- rank(order(v2, v1), ties.method='first'))

#   v1 v2 rank1 rank2
# 1  2  1     1     2
# 2  1  1     2     1
# 3  1  3     4     4
# 4  2  2     3     3

answered Dec 02 '14 at 22:38

Matthew Plourde

43,932
7
96
113

3

First, `ties.method` is not needed, `order` won't have ties. Second, it fails with this data: `x <- data.frame(v1 = c(2,3,1,2,1), v2 = c(1,1,3,2,1))`, so it's just wrong. – user May 12 '17 at 05:07
`order(order(x)) = rank(x)` barring ties, but in general `rank(order(x))` does not. Try a few examples if you are in doubt. I have edited your answer to use the correct version. https://stackoverflow.com/a/61647053/3371472 – eric_kernfeld Jul 24 '20 at 18:56
how to handle ties? i.e., if two rows have the same v1 and v2. – Sophia Aug 27 '21 at 14:28

score 3 · Answer 2 · answered Dec 02 '14 at 23:21

3

order works, but for manipulating data frames, also check out the plyr and dplyr packages.

> arranged_x <- arrange(x, v2, v1)

answered Dec 02 '14 at 23:21

mmuurr

1,310
1
11
21

Since you are using `dplyr`, you can also add a call to `mutate` to add the rank number, as in `arranged_x <- arrange(x, v2, v1) %>% mutate(rank = 1:n())`. – steveb Aug 21 '19 at 20:31

user · Answer 3 · 2017-05-12T08:39:56.190

0

Here we create a sequence of numbers and then reorder it as if it was created near the ordered data:

x$rank <- seq.int(nrow(x))[match(rownames(x),rownames(x[order(x$v2,x$v1),]))]

Or:

x$rank <- (1:nrow(x))[order(order(x$v2,x$v1))]

Or even:

x$rank <- rank(order(order(x$v2,x$v1)))

edited May 12 '17 at 08:39

answered May 12 '17 at 07:45

user

23,260
9
113
101

Ferroao · Answer 4 · 2019-08-24T15:34:27.433

Try this:

x <- data.frame(v1 = c(2,1,1,2), v2 = c(1,1,3,2))

# The order function returns the index (address) of the desired order 
# of the examined object rows
orderlist<- order(x$v2, x$v1)

# So to get the position of each row in the index, you can do a grep

x$rank<-sapply(1:nrow(x), function(x) grep(paste0("^",x,"$"), orderlist ) )
x

# For a little bit more general case
# With one tie

x <- data.frame(v1 = c(2,1,1,2,2), v2 = c(1,1,3,2,2))

x$rankv2<-rank(x$v2)
x$rankv1<-rank(x$v1)

orderlist<- order(x$rankv2, x$rankv1)  
orderlist

#This rank would not be appropriate
x$rank<-sapply(1:nrow(x), function(x) grep(paste0("^",x,"$"), orderlist ) )

#there are ties
grep(T,duplicated(x$rankv2,x$rankv1) )

# Example for only one tie

makeTieRank<-mean(x[which(x[,"rankv2"] %in% x[grep(T,duplicated(x$rankv2,x$rankv1) ),][,c("rankv2")] &
        x[,"rankv1"] %in% x[grep(T,duplicated(x$rankv2,x$rankv1) ),][,c("rankv1")]),]$rank)

x[which(x[,"rankv2"] %in% x[grep(T,duplicated(x$rankv2,x$rankv1) ),][,c("rankv2")] &
          x[,"rankv1"] %in% x[grep(T,duplicated(x$rankv2,x$rankv1) ),][,c("rankv1")]),]$rank<-makeTieRank
x

How to rank rows by two columns at once in R?

4 Answers4

Linked

Related