0

This is the head of my dataset.

   SM_id                                SM_nom  ...  MRS_CO_VER  Territoire
0    101               Montréal : Centre-ville  ...    V2017-08     3598479
1    102  Montréal : Centre-ville périphérique  ...    V2017-08    14048443
2    103                  Montréal : Sud-Ouest  ...    V2017-08    15130563
3    103                  Montréal : Sud-Ouest  ...    V2017-08         197
4    104        Montréal : Notre-Dame-de-Grâce  ...    V2017-08    10828311

There are some duplicates in SM_id variable. I would like to keep only the observation that has the maximum value on Territoire variable within each unique SM_id.

I have tried this :

MRC_to_SM = MRC_to_SM[MRC_to_SM.Territoire == MRC_to_SM.Territoire.max(level='SM_id')]

And I get this error:

level name SM_id is not the name of the index

How should I proceed?

Thanks,

  • Welcome to SO, this question is an exact duplicated of the question marked above. Do try out the approaches listed out – Vaishali Nov 26 '19 at 03:23

1 Answers1

1

You were using the wrong functions. idxmax is what you were looking for:

idx = df.groupby('SM_id')['Territoire'].idxmax()
df = df[df.index.isin(idx)]
Code Different
  • 90,614
  • 16
  • 144
  • 163