0

I have a dataset of a series with missing values that I want to replace by the index. The second column contains the same numbers than the first column, but in a different order.

here's an example:

>>> df
ind    u    v   d
0      5    7   151
1      7    20  151
2      8    40  151
3      20   5   151

this should turn out to:

>>>df
ind    u    v   d
0      1    2   151
1      2    4   151
2      3    5   151
3      4    1   151

i reindexed the values in row 'u' by creating a new column:

>>>df['new_index'] = range(1, len(numbers) + 1)

but how do I now replace values of the second column referring to the indexes?

Thanks for any advice!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ernie
  • 59
  • 5
  • "The second column contains the same numbers than the first column, but in a different order." Doesn't seem so in your example. – Ami Tavory Aug 09 '16 at 15:28
  • @ Ami: True, it contains same numbers and might also contain some more/less numbers. Numbers might also occur multiple times in a collumn. – Ernie Aug 09 '16 at 15:36
  • "it contains same numbers and might also contain some more/less numbers" That sort of renders "contains the same numbers than the first column" meaningless. – Ami Tavory Aug 09 '16 at 15:46

1 Answers1

1

You can use Series.rank, but first need create Series with unstack and last create DataFrame with unstack again:

df[['u','v']] = df[['u','v']].unstack().rank(method='dense').astype(int).unstack(0)
print (df)
     u  v    d
ind           
0    1  2  151
1    2  4  151
2    3  5  151
3    4  1  151

If use only DataFrame.rank, output in v is different:

df[['u','v']] = df[['u','v']].rank(method='dense').astype(int)
print (df)
     u  v    d
ind           
0    1  2  151
1    2  3  151
2    3  4  151
3    4  1  151
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252