3

I'm comparing a set of eight algorithms (solver column) using a set of instances, each instance is executed once for each algorithm and a level of a parameter D (goes from 1 to 10). So, the resulting data frame would look like this:

         instance  D    z             solver
0   1000_ep0.0075  1  994         threatened
1   1000_ep0.0075  1  993               desc
2   1000_ep0.0075  1  994             degree
3   1000_ep0.0075  1  993    threatened_desc
4   1000_ep0.0075  1  993  threatened_degree
5   1000_ep0.0075  1  994         desc_later
6   1000_ep0.0075  1  994       degree_later
7   1000_ep0.0075  1  993         dyn_degree
8   1000_ep0.0075  2  986         threatened
9   1000_ep0.0075  2  987               desc
10  1000_ep0.0075  2  988             degree
11  1000_ep0.0075  2  987    threatened_desc
12  1000_ep0.0075  2  986  threatened_degree
13  1000_ep0.0075  2  987         desc_later
14  1000_ep0.0075  2  988       degree_later
15  1000_ep0.0075  2  987         dyn_degree
....

Where the z column corresponds to the value found by the algorithm (smaller the better).

I would like to add a column to the dataframe, corresponding to the rank of each algorithm according to the value of z for each combination <instance, D>. For the example above, would be something like this:

         instance  D    z             solver z_rank
0   1000_ep0.0075  1  994         threatened 2
1   1000_ep0.0075  1  993               desc 1
2   1000_ep0.0075  1  994             degree 2
3   1000_ep0.0075  1  993    threatened_desc 1
4   1000_ep0.0075  1  993  threatened_degree 1
5   1000_ep0.0075  1  994         desc_later 2
6   1000_ep0.0075  1  994       degree_later 2
7   1000_ep0.0075  1  993         dyn_degree 1
8   1000_ep0.0075  2  986         threatened 1
9   1000_ep0.0075  2  987               desc 2
10  1000_ep0.0075  2  988             degree 3
11  1000_ep0.0075  2  987    threatened_desc 2
12  1000_ep0.0075  2  986  threatened_degree 1
13  1000_ep0.0075  2  987         desc_later 2
14  1000_ep0.0075  2  988       degree_later 3
15  1000_ep0.0075  2  987         dyn_degree 2
...

Using python-pandas, this is what I could get so far:

df.loc[:, 'z_rank'] = df_rg.groupby(['instance', 'D'])['z'].rank()
df.head(16)
         instance  D    z             solver  z_rank
0   1000_ep0.0075  1  994         threatened    47.5
1   1000_ep0.0075  1  993               desc    16.5
2   1000_ep0.0075  1  994             degree    47.5
3   1000_ep0.0075  1  993    threatened_desc    16.5
4   1000_ep0.0075  1  993  threatened_degree    16.5
5   1000_ep0.0075  1  994         desc_later    47.5
6   1000_ep0.0075  1  994       degree_later    47.5
7   1000_ep0.0075  1  993         dyn_degree    16.5
8   1000_ep0.0075  2  986         threatened     7.0
9   1000_ep0.0075  2  987               desc    18.5
10  1000_ep0.0075  2  988             degree    44.5
11  1000_ep0.0075  2  987    threatened_desc    18.5
12  1000_ep0.0075  2  986  threatened_degree     7.0
13  1000_ep0.0075  2  987         desc_later    18.5
14  1000_ep0.0075  2  988       degree_later    44.5
15  1000_ep0.0075  2  987         dyn_degree    18.5

Which is clearly not what I want.

Could somebody help me with that?

Natanael Ramos
  • 340
  • 3
  • 10

2 Answers2

9

You require method=dense in SeriesGroupBy.rank() where the ranks increase by 1 between groups:

df['z_rank'] = df.groupby(['instance', 'D'])['z'].rank(method='dense').astype(int)

enter image description here

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • Nice! Thanks! Is there any way to not increase the rank between groups? – Natanael Ramos Jan 16 '17 at 17:26
  • So, how would the output be in such a case? – Nickil Maveli Jan 16 '17 at 17:28
  • Because when I apply the dense method in the entire df, I get the following: http://pastebin.com/raw/9me5tnTa. In the first group, the smallest rank is 3, where should be 1. I suppose that's because the increasing between groups feature – Natanael Ramos Jan 16 '17 at 17:31
  • 1
    No, there could be some presence of (1000_ep0.0075, 1) below the attached screenshot and whose `D` values are below 993. You can do `df.sort_values(['instance', 'D'], inplace=True)` as a prior step to get a clear picture of this. What `method=dense` basically does is it numerically assigns ranks to elements of the group which increase in the order from ascending (least value=1) to descending. – Nickil Maveli Jan 16 '17 at 17:40
  • 1
    I think I get it. Thanks! – Natanael Ramos Jan 16 '17 at 17:45
0

I tried it with the following code. I get 1 for all on the FrSeg column.

Merge_Data['FrSeg'] = Merge_Data.groupby(['CustomerKey'])
['Frequency'].rank(method='dense').astype(int)

I wonder how to get it into 3 groups. I have digits from 1 to 68 in the Frequency column

Kalamarico
  • 5,466
  • 22
  • 53
  • 70
L.G.
  • 71
  • 1
  • 10