Edit: clearified the question
I want to aggregate a pd.DataFrame called df by "Identifier" and sum the "Costs" columns. For the category columns I want to apply an aggregation function that can be spoken out loud like "aggregate and take the most frequent value (the mode) of the column, but if the mode is a blank than take the second most frequent column". In other words, I want to have the mode of the category (after aggregation) but the mode must not be a blank.
Result should be the pd.DataFrame new_df.
df
Identifier Cost Cost2 Category1 Category2 Category3
0 A 10 10 one aaa
1 A 20 10 blue aaa
2 B 10 20 two bbb
3 B 10 30 green bbb
4 B 30 40 bbb
5 C 20 50 three red ccc
--- aggregation process --->
new_df
Identifier Cost Cost2 Category1 Category2 Category3
0 A 30 20 one blue aaa
1 B 50 90 two green bbb
2 C 20 50 three red ccc
Code to reproduce the example:
import pandas as pd
data_df = {
'Identifier': ['A', 'A', 'B', 'B', 'B', 'C'],
'Cost': [10, 20, 10, 10, 30, 20],
'Cost2':[10,10,20,30,40,50],
'Category1' : ['one', '', 'two', '', '', 'three'],
'Category2' : ['', 'blue', '', 'green', '', 'red'],
'Category3' : ['aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'ccc']
}
df = pd.DataFrame(data_df)
data_new_df = {
'Identifier': ['A', 'B', 'C'],
'Cost': [30, 50, 20],
'Cost2' : [20,90,50],
'Category1' : ['one', 'two', 'three'],
'Category2' : ['blue', 'green', 'red'],
'Category3' : ['aaa', 'bbb', 'ccc']
}
new_df = pd.DataFrame(data_new_df)