Subset a pandas dataframe that has an index that contains duplicates

Question

For the data frame:

df = pd.DataFrame({
    'key': [1,2,3,4,5, np.nan, np.nan],
    'value': ['one','two','three', 'four', 'five', 'six', 'seven']
}).set_index('key')

That looks like this:

        value
key     
1.0     one
2.0     two
3.0     three
4.0     four
5.0     five
NaN     six
NaN     seven

I would like to subset it to:

    value
key     
1   one
1   one
6   NaN

This produces a warning:

df.loc[[1,1,6],]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

This produces an error:

df.reindex([1, 1, 6])

ValueError: cannot reindex from a duplicate axis

How to do it while referencing a missing index and without using apply?

I had answered but it actually depends on your pandas version.. what version are you using ? — rafaelc, Aug 23 '18 at 00:05

rafaelc · Accepted Answer · 2018-08-23T00:09:34.880

The thing is you have duplicated values NaNs as indexes. You should disconsider those when reindexing because they are duplicates and there is ambiguity on which value use in the new index.

df.loc[df.index.dropna()].reindex([1, 1, 6])

    value
key 
1   one
1   one
6   NaN

For a generalized solution, use duplicated

df.loc[~df.index.duplicated(keep=False)].reindex([1, 1, 6])

If you want to keep duplicated indexes and use reindex, you'll fail. This has actually been asked before a couple of times

Subset a pandas dataframe that has an index that contains duplicates

1 Answers1