0

Description: I have two data frames foreign (having options in English and its Urdu translation) and native,having German version of options, Lengths of both DFs is unequal the native is much bigger than foreign and carries complete list of options. I want to merge them two using idx column so that native df searches for each entry in its idx in the foreign; if idx is matching than its adds two columns namely: option_eng and option_urdu otherwise fill it with NAN

I have two dataframs
foreign:
idx                    option_eng                    option_urdu
1.1.01-001   This is english of option1             urdu translation of option1
1.1.01-001   This is english of option2             urdu translation of option2
1.1.01-001   This is english of option3             urdu translation of option3 
1.1.01-002   This is english of option1             urdu translation of option1
1.1.01-002   This is english of option2             urdu translation of option2
1.1.01-002   This is english of option3              Urdu translation of option3
1.1.01-003   This is english of option1              urdu translation of option1
1.1.01-003   This is english of option2              urdu translation of option2
1.1.01-004   This is english of option1              urdu translation of option1
1.1.01-004   This is english of option2               urdu translation of option2

Native:>> a unequal to df1
  idx    sort_code  iscorrect   count                option_de
1.1.01-001  1          1          1              German tranlation option3
1.1.01-001  2          1          2              German tranlation option2
1.1.01-001  3          0          3              German tranlation option1
1.1.01-002  1          1          4              German tranlation option3
1.1.01-002  2          1          5             German tranlation option2
1.1.01-002  3          1          6             German tranlation option1
1.1.01-003  1          1          7             German tranlation option3
1.1.01-003  2          1          8             German tranlation option2
1.1.01-003  3          0          9             German tranlation option1
1.1.01-012  1          1          10             German tranlation option3
1.1.01-012  2          1          11             German tranlation option2
1.1.01-012  3          1          12             German tranlation option1
1.1.01-101  1          1          13             German tranlation option3
1.1.01-101  2          1          14             German tranlation option2

My desired output should look like

idx               option_de                       option_eng           option_urdu
1.1.01-001  German tranlation option3   This is english of option3  This is urdu of option3
1.1.01-001  German tranlation option2   This is english of option2  This is urdu of option2
1.1.01-001  German tranlation option1   This is english of option1  This is urdu of option1
1.1.01-002  German tranlation option3   This is english of option3  This is urdu of option3
1.1.01-002  German tranlation option2   This is english of option2  This is urdu of option2
1.1.01-002  German tranlation option1   This is english of option1  This is urdu of option1
1.1.01-003  German tranlation option3   This is english of option3  This is urdu of option3
1.1.01-003  German tranlation option2   This is english of option2  This is urdu of option2
1.1.01-003  German tranlation option1   This is english of option1  This is urdu of option1
1.1.01-012  German tranlation option3     na                                  na
1.1.01-012  German tranlation option2     na                                  na
1.1.01-012  German tranlation option1     na                                  na
1.1.01-101  German tranlation option3     na                                  na
1.1.01-101  German tranlation option2     na                                  na


Note Please note that the df2 the German option are sorted inversely like option 3 ,2,1 and in df1 the option_en and option_urdu are in opposite order I want to match them.

What I have tried?

I have tried different options.native.merge(foreign,how='left',left_on='idx',right_on='idx')

Merged always gives me duplicated rows of idxs of both df (like every idx 3 times) I have also tries pd.combine_first but its changes the orders of options.

I have tried it with native['key']=native.groupby(["code"]).cumcount()

source code to reproduce it

Foreigndic={'idx': {0: '1.1.01-001', 1: '1.1.01-001', 2: '1.1.01-001', 3: '1.1.01-002', 4: '1.1.01-002', 5: '1.1.01-002', 6: '1.1.01-003', 7: '1.1.01-003', 8: '1.1.01-004', 9: '1.1.01-004'}, 'option_eng': {0: ' Stopping as a precaution at every crossroads', 1: ' Not insisting on your rights', 2: " Allowing for other people's mistakes", 3: ' Inattentiveness', 4: ' Driving too close behind the vehicle in front', 5: ' Unexpectedly heavy braking', 6: '  respond as soon as possible to probable changes in the road traffic situation', 7: '  attempt to identify the intentions of other road users, as soon as possible', 8: ' - of its braking characteristics', 9: ' - of its manoeuvrability'}, 'option_urdu': {0: 'احتیاط کے طور پر ہر دوراہے پر رکنا', 1: 'اپنے حقوق پر اصرار نہیں کرنا', 2: 'دوسرے لوگوں کی غلطیوں کی اجازت دینا', 3: 'غفلت', 4: 'اگلی گاڑی سے مناسب فاصلہ نہ رکھنا', 5: 'غیر متوقع طور پر اگلی گاڑی کا اچانک بریک لگا دینا', 6: 'سڑک پر موجود ٹریفک کی صورتحال میں ممکنہ تبدیلیوں کا جلد سے جلد جواب دینا', 7: 'جتنی جلدی ممکن ہو سڑک کے دوسرے صارفین کے ارادوں کی نشاندہی کرنے کی کوشش کریں', 8: ' اس کی بریک خصوصیات', 9: 'اس کے سسٹم کی جانکاری'}}
Nativedict={'code': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3, 9: 1, 10: 2, 11: 3, 12: 1, 13: 2, 14: 3}, 'ans_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15}, 'iscorrect': {0: 1, 1: 1, 2: 0, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1}, 'ans_id_txt': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15}, 'text': {0: 'Nicht auf dem eigenen Recht bestehen', 1: 'Mit Fehlern anderer rechnen', 2: 'Vorsorglich an jeder Kreuzung anhalten', 3: 'Unerwartet starkes Bremsen', 4: 'Unaufmerksamkeit', 5: 'Zu dichtes Auffahren', 6: 'Ich reagiere möglichst frühzeitig auf wahrscheinliche Veränderungen der Verkehrssituation', 7: 'Ich versuche, die Absichten anderer Verkehrsteilnehmer möglichst frühzeitig zu erkennen', 8: 'Ich beschränke die Verkehrsbeobachtung möglichst auf das direkt vor mir fahrende Fahrzeug', 9: 'Ich weiß möglicherweise zu wenig über die Besonderheiten in der Anordnung und der Funktion der Bedienelemente', 10: 'Ich weiß möglicherweise zu wenig über die Besonderheiten im Bremsverhalten', 11: 'Ich weiß möglicherweise zu wenig über die Besonderheiten im Lenkverhalten', 12: 'Der Vorausfahrende bremst unerwartet', 13: 'Der Vorausfahrende betätigt den Blinker vor dem Abbiegen zu spät', 14: 'Der Vorausfahrende hält unerwartet an, um nach dem Weg zu fragen'}, 'ques_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4, 12: 5, 13: 5, 14: 5}, 'idx': {0: '1.1.01-001', 1: '1.1.01-001', 2: '1.1.01-001', 3: '1.1.01-002', 4: '1.1.01-002', 5: '1.1.01-002', 6: '1.1.01-003', 7: '1.1.01-003', 8: '1.1.01-003', 9: '1.1.01-004', 10: '1.1.01-004', 11: '1.1.01-004', 12: '1.1.01-101', 13: '1.1.01-101', 14: '1.1.01-101'}}

Faisal
  • 151
  • 3
  • 10
  • 1
    if you want to merge without duplicates you'll need to eliminate dulicates in one of the dataframes OR merge on more keys to create a unique row per key. – Umar.H Apr 30 '21 at 13:40
  • The problem is, that it is unclear which foreign // native row pair is valid. In your expected output it seems like there is only 1 allowed valid pair possible. But from you provided data it is imposible to determine, how this pairing works. – Andreas Apr 30 '21 at 13:44
  • I think this is a dupe of [Pandas Merge 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) imo. – Umar.H Apr 30 '21 at 13:49

2 Answers2

1

Assuming that you don't need to sort the tables, pd.concat() with axis=1 as an argument will do the trick.

pd.concat([
    native[['idx', 'option_de']], 
    foreign[['option_eng', 'option_urdu']]
], axis=1)

That worked for me

native['key']=native.groupby(["code"]).cumcount()
native= native.sort_values(['key','code'],ascending=[True,False]).drop('key',1).reset_index()

native=native.rename(columns={'idx.1':'idx'})
mrg=pd.concat([native[['idx', 'text','iscorrect']],foreign[['option_eng', 'option_urdu']]], axis=1)
Faisal
  • 151
  • 3
  • 10
Gusti Adli
  • 1,225
  • 4
  • 13
  • In this way the order of matching index cannot be preserved. It is just blink addition of two extra columns, but I want to match based on index that is common in both dfs – Faisal Apr 30 '21 at 15:35
1

Try creating a unique index for each idx group by adding an order element. Then merge together on the idx and the order column.

import pandas as pd

foreign = pd.DataFrame({'idx': {0: '1.1.01-001', 1: '1.1.01-001',
                                2: '1.1.01-001', 3: '1.1.01-002',
                                4: '1.1.01-002', 5: '1.1.01-002',
                                6: '1.1.01-003', 7: '1.1.01-003',
                                8: '1.1.01-004', 9: '1.1.01-004'},
                        'option_eng': {
                            0: ' Stopping as a precaution at every crossroads',
                            1: ' Not insisting on your rights',
                            2: " Allowing for other people's mistakes",
                            3: ' Inattentiveness',
                            4: ' Driving too close behind the vehicle in front',
                            5: ' Unexpectedly heavy braking',
                            6: '  respond as soon as possible to probable changes in the road traffic situation',
                            7: '  attempt to identify the intentions of other road users, as soon as possible',
                            8: ' - of its braking characteristics',
                            9: ' - of its manoeuvrability'
                        },
                        'option_urdu': {
                            0: 'احتیاط کے طور پر ہر دوراہے پر رکنا',
                            1: 'اپنے حقوق پر اصرار نہیں کرنا',
                            2: 'دوسرے لوگوں کی غلطیوں کی اجازت دینا',
                            3: 'غفلت',
                            4: 'اگلی گاڑی سے مناسب فاصلہ نہ رکھنا',
                            5: 'غیر متوقع طور پر اگلی گاڑی کا اچانک بریک لگا دینا',
                            6: 'سڑک پر موجود ٹریفک کی صورتحال میں ممکنہ تبدیلیوں کا جلد سے جلد جواب دینا',
                            7: 'جتنی جلدی ممکن ہو سڑک کے دوسرے صارفین کے ارادوں کی نشاندہی کرنے کی کوشش کریں',
                            8: ' اس کی بریک خصوصیات',
                            9: 'اس کے سسٹم کی جانکاری'}
                        })

native = pd.DataFrame(
    {'code': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2,
              8: 3, 9: 1, 10: 2, 11: 3, 12: 1, 13: 2, 14: 3},
     'ans_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8,
                8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15},
     'iscorrect': {0: 1, 1: 1, 2: 0, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1,
                   8: 0, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1},
     'ans_id_txt': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8,
                    8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14,
                    14: 15},
     'text': {0: 'Nicht auf dem eigenen Recht bestehen',
              1: 'Mit Fehlern anderer rechnen',
              2: 'Vorsorglich an jeder Kreuzung anhalten',
              3: 'Unerwartet starkes Bremsen',
              4: 'Unaufmerksamkeit',
              5: 'Zu dichtes Auffahren',
              6: 'Ich reagiere möglichst frühzeitig auf '
                 'wahrscheinliche Veränderungen der Verkehrssituation',
              7: 'Ich versuche, die Absichten anderer '
                 'Verkehrsteilnehmer möglichst frühzeitig zu erkennen',
              8: 'Ich beschränke die Verkehrsbeobachtung '
                 'möglichst auf das direkt vor mir fahrende Fahrzeug',
              9: 'Ich weiß möglicherweise zu wenig über die Besonderheiten '
                 'in der Anordnung und der Funktion der Bedienelemente',
              10: 'Ich weiß möglicherweise zu wenig über '
                  'die Besonderheiten im Bremsverhalten',
              11: 'Ich weiß möglicherweise zu wenig über '
                  'die Besonderheiten im Lenkverhalten',
              12: 'Der Vorausfahrende bremst unerwartet',
              13: 'Der Vorausfahrende betätigt den Blinker vor dem Abbiegen zu spät',
              14: 'Der Vorausfahrende hält unerwartet an, um nach dem Weg zu fragen'},
     'ques_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3,
                 8: 3, 9: 4, 10: 4, 11: 4, 12: 5, 13: 5, 14: 5},
     'idx': {0: '1.1.01-001', 1: '1.1.01-001', 2: '1.1.01-001',
             3: '1.1.01-002', 4: '1.1.01-002', 5: '1.1.01-002',
             6: '1.1.01-003', 7: '1.1.01-003', 8: '1.1.01-003',
             9: '1.1.01-004', 10: '1.1.01-004', 11: '1.1.01-004',
             12: '1.1.01-101', 13: '1.1.01-101', 14: '1.1.01-101'}}
)

# Grab Columns and add key for group order (level_1)
foreign = foreign[['idx', 'option_eng', 'option_urdu']].groupby('idx') \
    .apply(lambda x: x.reset_index(drop=True)) \
    .drop(columns=['idx']) \
    .reset_index()
native = native[['idx', 'text']].groupby('idx') \
    .apply(lambda x: x.reset_index(drop=True)) \
    .drop(columns=['idx']) \
    .reset_index()

# Merge Together on idx and level_1
merged = native.merge(foreign, on=['idx', 'level_1'], how='outer') \
    .drop(columns='level_1') \
    .fillna('na')

print(merged.to_string(index=False))
idx text option_eng option_urdu
1.1.01-001 Nicht auf dem eigenen Recht bestehen Stopping as a precaution at every crossroads احتیاط کے طور پر ہر دوراہے پر رکنا
1.1.01-001 Mit Fehlern anderer rechnen Not insisting on your rights اپنے حقوق پر اصرار نہیں کرنا
1.1.01-001 Vorsorglich an jeder Kreuzung anhalten Allowing for other people's mistakes دوسرے لوگوں کی غلطیوں کی اجازت دینا
1.1.01-002 Unerwartet starkes Bremsen Inattentiveness غفلت
1.1.01-002 Unaufmerksamkeit Driving too close behind the vehicle in front اگلی گاڑی سے مناسب فاصلہ نہ رکھنا
1.1.01-002 Zu dichtes Auffahren Unexpectedly heavy braking غیر متوقع طور پر اگلی گاڑی کا اچانک بریک لگا دینا
1.1.01-003 Ich reagiere möglichst frühzeitig auf wahrscheinliche Veränderungen der Verkehrssituation respond as soon as possible to probable changes in the road traffic situation سڑک پر موجود ٹریفک کی صورتحال میں ممکنہ تبدیلیوں کا جلد سے جلد جواب دینا
1.1.01-003 Ich versuche, die Absichten anderer Verkehrsteilnehmer möglichst frühzeitig zu erkennen attempt to identify the intentions of other road users, as soon as possible جتنی جلدی ممکن ہو سڑک کے دوسرے صارفین کے ارادوں کی نشاندہی کرنے کی کوشش کریں
1.1.01-003 Ich beschränke die Verkehrsbeobachtung möglichst auf das direkt vor mir fahrende Fahrzeug na na
1.1.01-004 Ich weiß möglicherweise zu wenig über die Besonderheiten in der Anordnung und der Funktion der Bedienelemente - of its braking characteristics اس کی بریک خصوصیات
1.1.01-004 Ich weiß möglicherweise zu wenig über die Besonderheiten im Bremsverhalten - of its manoeuvrability اس کے سسٹم کی جانکاری
1.1.01-004 Ich weiß möglicherweise zu wenig über die Besonderheiten im Lenkverhalten na na
1.1.01-101 Der Vorausfahrende bremst unerwartet na na
1.1.01-101 Der Vorausfahrende betätigt den Blinker vor dem Abbiegen zu spät na na
1.1.01-101 Der Vorausfahrende hält unerwartet an, um nach dem Weg zu fragen na na

Aside: Your provided data does not appear to align correctly. Foreign: 1.1.01-002,0,Inattentiveness is the first row of the 1.1.01-002 group, but that aligns with Native: 1.1.01-002,0,Unerwartet starkes Bremsen which is incorrect. It should align with the second row 1.1.01-002,1,Unaufmerksamkeit. But assuming you have translations in the correct order in both tables this should work.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Yes Actually that is the problem. We can change the order with `code` that is `sort_code` 1,2,3, but I need to inverse it to match it with `eng` and `Urdu translation` I have use this method to rearrange . `native['key']=native.groupby(["code"]).cumcount() #print(native.head(100)) #native= native.sort_values(['key','code'],ascending=[True,False]).drop('key',1)#.reset_index()` the problem is when I merge it automatically change the order – Faisal Apr 30 '21 at 14:20
  • 1
    The alignment is still not consistent. For group `1.1.01-001` the mapping from `native` to `foreign` is 0 -> 1, 1 -> 0, 2 -> 2. That seems like you have some kind of rotation in `foreign`. But `1.1.01-002` the mapping is 0 -> 0, 1-> 2, 2 -> 1. Which is a different kind of rotation in `foreign`. – Henry Ecker Apr 30 '21 at 14:36