-1

I have a hc dataframe containing 2 columns called "Previous Cost Center" & "Current Cost Center" and would like to add two new columns called "Previous/Current Grouping" which should be filled with the grouping of the cost centers based on a mapping table with 2 columns named "CostCenter" & Grouping.

In excel I would just apply vlookup to fill the fields based on the cost center in each respective column and map it by the mapping table.

I've been playing around with the merge function but I couldn't get it to work.

How can I merge dataframe based on specified columns into a specified column?

HC List:

    ID Cost Center Previous Cost Center Curent
0    1                    A                NaN
1    2                    B                  C
2    3                  NaN                NaN
3    4                    D                  D
4    5                    F                  F
5    6                  NaN                  A
6    7                    A                  B
7    8                  NaN                  B
8    9                    A                  C
9   10                    B                  D
10  11                    D                  F
11  12                    C                NaN

Mapping:

Cost Center Grouping

0           A     XERX
1           B       ID
2           C     BLOC
3           D     KLRO
4           E      PPP
5           F      SDF
6           G        D

Output:

    ID Cost Center Previous Grouping Previous Cost Center Current Grouping Current
0    1                    A              XERX                 NaN              NaN
1    2                    B                ID                   C             BLOC
2    3                  NaN               NaN                 NaN              NaN
3    4                    D              KLRO                   D             KLRO
4    5                    F               SDF                   F              SDF
5    6                  NaN               NaN                   A             XERX
6    7                    A              XERX                   B               ID
7    8                  NaN               NaN                   B               ID
8    9                    A              XERX                   C             BLOC
9   10                    B                ID                   D             KLRO
10  11                    D              KLRO                   F              SDF
11  12                    C              BLOC                 NaN              NaN
alpenmilch411
  • 483
  • 1
  • 5
  • 18
  • 1
    Please share some sample data, and also desired output. – pmaniyan Apr 30 '16 at 13:18
  • Sample data added. – alpenmilch411 Apr 30 '16 at 14:13
  • 2
    This is the same question as this: http://stackoverflow.com/questions/25493625/vlookup-in-pandas-using-join?rq=1 – EdChum Apr 30 '16 at 14:16
  • 1
    I agree to EdChum comment. The solution provided there is the same for this problem also. – pmaniyan Apr 30 '16 at 14:25
  • I just tried this but there are now 2408 rows when there should only be 1358. It seems like some entries got doubled. – alpenmilch411 Apr 30 '16 at 15:24
  • I resolved this issue by using drop_duplicates() but is there a better way to prevent this in the first place? – alpenmilch411 Apr 30 '16 at 15:37
  • You have a data consistency problem in that case, are you sure the rhs entries are unique? Also you can't map `NaN` to any consistent value which may be your problem, when `join`ing or `map` do it on a subset like `df.loc[df['Previous Cost Center'].notnull(), 'Cost Center Current'] = df.loc[df['Previous Cost Center'].notnull(), 'Previous Cost Center'].map(df1.set_index('Cost Center')['Grouping'])` – EdChum Apr 30 '16 at 21:26
  • You were right. I had duplcates in my mapping table, which caused the multiple entries. Thank you for the good explanation. – alpenmilch411 May 01 '16 at 04:14

1 Answers1

0

Assuming HC and grouping are DataFrames

Solution

grouping = grouping.set_index('Cost Center')
HC['Grouping Current'] = HC['Cost Center Current'].apply(lambda x: grouping.ix[x])
HC['Grouping Previous'] = HC['Cost Center Previous'].apply(lambda x: grouping.ix[x])
piRSquared
  • 285,575
  • 57
  • 475
  • 624