1

I'm trying to do an excel equivalent of vLookup in Pandas.

I tried various iterations found here : vlookup in Pandas using join but it didn't seem to work.

I have two dataframe, with a common field 'itemID'. I wish to ascribe the item characteristics found in the 1st dataframe, to the 2nd dataframe.

I show an example code below of what I would like to achieve. Would anyone be able to kindly help?

# test code to ask other people

res1_ = [['SABR', 'Cat1', '2y10y', 'A001'], ['SABR', 'Cat1', '5y30y', 'A002'], ['Vasicek', 'Cat1', '2y10y', 'A003'], ['LMM', 'Cat1', '2y10y', 'A004']]
df1 = pd.DataFrame(res1_, columns = ['Model', 'Type', 'Pair', 'itemID'])

res2_ = [['A001', 'Vega'], ['A003', 'Delta'], ['A001', 'Gamma'], ['A002', 'Vega'], ['A002', 'Delta'], ['A006', 'Delta']]
df2 = pd.DataFrame(res2_, columns = ['itemID', 'Metric'])

display(df1)
display(df2)
print('this is not what I want')
display(df2.merge(df1, on = 'itemID', how = 'outer'))

print('this is what I would like to get')
res3_ = [['A001', 'Vega', 'SABR', 'Cat1', '2y10y'], ['A003', 'Delta', 'Vasicek', 'Cat1', '2y10y'], ['A001', 'Gamma', 'SABR', 'Cat1', '2y10y'],\
         ['A002', 'Vega', 'SABR', 'Cat1', '5y30y'], ['A002', 'Delta', 'SABR', 'Cat1', '5y30y'], ['A006', 'Delta']]
pd.DataFrame(res3_, columns = ['itemID', 'Metric', 'Model', 'Type', 'Pair'])
Kiann
  • 531
  • 1
  • 6
  • 20

1 Answers1

0

From the documentation :

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

Therefore your merge operation whould be written this way:

 df2.merge(df1,on=['itemID'],how='left')

Output

|    | itemID   | Metric   | Model   | Type   | Pair   |
|---:|:---------|:---------|:--------|:-------|:-------|
|  0 | A001     | Vega     | SABR    | Cat1   | 2y10y  |
|  1 | A003     | Delta    | Vasicek | Cat1   | 2y10y  |
|  2 | A001     | Gamma    | SABR    | Cat1   | 2y10y  |
|  3 | A002     | Vega     | SABR    | Cat1   | 5y30y  |
|  4 | A002     | Delta    | SABR    | Cat1   | 5y30y  |
|  5 | A006     | Delta    | nan     | nan    | nan    |
Sebastien D
  • 4,369
  • 4
  • 18
  • 46
  • thanks @sebastien D, I realize my question was slightly incomplete. I have (unfortunately), multiple row of the same data (res1_ has perhaps a few rows that are duplicate)... and hence the code generates multiple lines as well... is there a way to resolve that? – Kiann Jul 20 '20 at 12:38