How to keep observations within duplicates over a maximum value condition?

Question

This is the head of my dataset.

   SM_id                                SM_nom  ...  MRS_CO_VER  Territoire
0    101               Montréal : Centre-ville  ...    V2017-08     3598479
1    102  Montréal : Centre-ville périphérique  ...    V2017-08    14048443
2    103                  Montréal : Sud-Ouest  ...    V2017-08    15130563
3    103                  Montréal : Sud-Ouest  ...    V2017-08         197
4    104        Montréal : Notre-Dame-de-Grâce  ...    V2017-08    10828311

There are some duplicates in SM_id variable. I would like to keep only the observation that has the maximum value on Territoire variable within each unique SM_id.

I have tried this :

MRC_to_SM = MRC_to_SM[MRC_to_SM.Territoire == MRC_to_SM.Territoire.max(level='SM_id')]

And I get this error:

level name SM_id is not the name of the index

How should I proceed?

Thanks,

Welcome to SO, this question is an exact duplicated of the question marked above. Do try out the approaches listed out — Vaishali, Nov 26 '19 at 03:23

score 1 · Answer 1 · answered Nov 26 '19 at 03:25

1

You were using the wrong functions. idxmax is what you were looking for:

idx = df.groupby('SM_id')['Territoire'].idxmax()
df = df[df.index.isin(idx)]

answered Nov 26 '19 at 03:25

Code Different

90,614
16
144
163

It works. Thanks – Samuel Forget Lord Nov 26 '19 at 15:16

How to keep observations within duplicates over a maximum value condition?

1 Answers1