Dense Rank by Multiple Columns in R

Question

How can I get a dense rank of multiple columns in a dataframe? For example,

# I have:
df <- data.frame(x = c(1,1,1,1,2,2,2,3,3,3), 
                 y = c(1,2,3,4,2,2,2,1,2,3))
# I want:
res <- data.frame(x = c(1,1,1,1,2,2,2,3,3,3), 
                  y = c(1,2,3,4,2,2,2,1,2,3),
                  r = c(1,2,3,4,5,5,5,6,7,8))
res
   x y z
1  1 1 1
2  1 2 2
3  1 3 3
4  1 4 4
5  2 2 5
6  2 2 5
7  2 2 5
8  3 1 6
9  3 2 7
10 3 3 8

My hack approach works for this particular dataset:

df %>%
  arrange(x,y) %>%
  mutate(r = if_else(y - lag(y,default=0) == 0, 0, 1)) %>%
  mutate(r = cumsum(r))

But there must be a more general solution, maybe using functions like dense_rank() or row_number(). But I'm struggling with this.

dplyr solutions are ideal.

Do you want to rank the rows? Or do you want a dense rank in the sense that there are no gaps? — Brandon Bertelsen, Oct 04 '18 at 22:32
You should post your solution as an answer and then check mark it a few days later. Someone may come by asking in the same way as you., — Brandon Bertelsen, Oct 04 '18 at 22:44

score 13 · Accepted Answer · answered Oct 04 '18 at 22:46

13

Right after posting, I think I found a solution here. In my case, it would be:

mutate(df, r = dense_rank(interaction(x,y,lex.order=T)))

But if you have a better solution, please share.

answered Oct 04 '18 at 22:46

lowndrul

3,715
7
36
54

arvi1000 · Answer 2 · 2018-10-04T23:25:09.263

6

data.table

data.table has you covered with frank().

library(data.table)    
frank(df, x,y, ties.method = 'min')

[1]  1  2  3  4  5  5  5  8  9 10

You can df$r <- frank(df, x,y, ties.method = 'min') to add as a new column.

tidyr/dplyr

Another option (though clunkier) is to use tidyr::unite to collapse your columns to one plus dplyr::dense_rank.

library(tidyverse)

df %>%

  # add a single column with all the info
  unite(xy, x, y) %>%
  cbind(df) %>%

  # dense rank on that
  mutate(r = dense_rank(xy)) %>%

  # now drop the helper col
  select(-xy)

edited Oct 04 '18 at 23:25

answered Oct 04 '18 at 23:14

arvi1000

9,393
2
42
52

Love the `data.table` solution. Time for me to get on board with that package. The `tidyverse` solution isn't quite general enough for me. E.g., if we used `df2 <- data.frame(x = c(1,1,1,1,20,20,20,3,3,3), y = c(1,2,3,4,20,20,20,1,2,3))` in place of `df`, we'd get the same rank `r` from above. But we we should get is `r2 <- c(1,2,3,4,5,6,7,8,8,8)`. So the `tidyverse` solution won't generally work for numbers with varying digits. But I didn't specify in my question the generality I was looking for. – lowndrul Oct 05 '18 at 19:23
1

Also, for your `tidyverse` solution you can use `unite(xy, x, y, drop=F)` and eliminate the `cbind(df)` row. Saves a line of code :) – lowndrul Oct 05 '18 at 19:25

score 0 · Answer 3 · answered Mar 04 '22 at 06:40

You can use cur_group_id:

library(dplyr)

df %>%
  group_by(x, y) %>% 
  mutate(r = cur_group_id())

# x     y     r
# <dbl> <dbl> <int>
# 1     1     1     1
# 2     1     2     2
# 3     1     3     3
# 4     1     4     4
# 5     2     2     5
# 6     2     2     5
# 7     2     2     5
# 8     3     1     6
# 9     3     2     7
# 10    3     3     8

Dense Rank by Multiple Columns in R

3 Answers3

data.table

tidyr/dplyr