Vlookup in 2 specified columns

Question

I have a hc dataframe containing 2 columns called "Previous Cost Center" & "Current Cost Center" and would like to add two new columns called "Previous/Current Grouping" which should be filled with the grouping of the cost centers based on a mapping table with 2 columns named "CostCenter" & Grouping.

In excel I would just apply vlookup to fill the fields based on the cost center in each respective column and map it by the mapping table.

I've been playing around with the merge function but I couldn't get it to work.

How can I merge dataframe based on specified columns into a specified column?

HC List:

    ID Cost Center Previous Cost Center Curent
0    1                    A                NaN
1    2                    B                  C
2    3                  NaN                NaN
3    4                    D                  D
4    5                    F                  F
5    6                  NaN                  A
6    7                    A                  B
7    8                  NaN                  B
8    9                    A                  C
9   10                    B                  D
10  11                    D                  F
11  12                    C                NaN

Mapping:

Cost Center Grouping

0           A     XERX
1           B       ID
2           C     BLOC
3           D     KLRO
4           E      PPP
5           F      SDF
6           G        D

Output:

    ID Cost Center Previous Grouping Previous Cost Center Current Grouping Current
0    1                    A              XERX                 NaN              NaN
1    2                    B                ID                   C             BLOC
2    3                  NaN               NaN                 NaN              NaN
3    4                    D              KLRO                   D             KLRO
4    5                    F               SDF                   F              SDF
5    6                  NaN               NaN                   A             XERX
6    7                    A              XERX                   B               ID
7    8                  NaN               NaN                   B               ID
8    9                    A              XERX                   C             BLOC
9   10                    B                ID                   D             KLRO
10  11                    D              KLRO                   F              SDF
11  12                    C              BLOC                 NaN              NaN

This is the same question as this: http://stackoverflow.com/questions/25493625/vlookup-in-pandas-using-join?rq=1 — EdChum, Apr 30 '16 at 14:16
I agree to EdChum comment. The solution provided there is the same for this problem also. — pmaniyan, Apr 30 '16 at 14:25
I just tried this but there are now 2408 rows when there should only be 1358. It seems like some entries got doubled. — alpenmilch411, Apr 30 '16 at 15:24
I resolved this issue by using drop_duplicates() but is there a better way to prevent this in the first place? — alpenmilch411, Apr 30 '16 at 15:37
You have a data consistency problem in that case, are you sure the rhs entries are unique? Also you can't map `NaN` to any consistent value which may be your problem, when `join`ing or `map` do it on a subset like `df.loc[df['Previous Cost Center'].notnull(), 'Cost Center Current'] = df.loc[df['Previous Cost Center'].notnull(), 'Previous Cost Center'].map(df1.set_index('Cost Center')['Grouping'])` — EdChum, Apr 30 '16 at 21:26
You were right. I had duplcates in my mapping table, which caused the multiple entries. Thank you for the good explanation. — alpenmilch411, May 01 '16 at 04:14

score 0 · Answer 1 · answered Apr 30 '16 at 14:24

Assuming HC and grouping are DataFrames

Solution

grouping = grouping.set_index('Cost Center')
HC['Grouping Current'] = HC['Cost Center Current'].apply(lambda x: grouping.ix[x])
HC['Grouping Previous'] = HC['Cost Center Previous'].apply(lambda x: grouping.ix[x])

Vlookup in 2 specified columns

1 Answers1

Solution