Take for example this dataframe:
a b c
0 1 2 3
1 1 2 3
2 2 1 1
3 2 0 0
I have the averages of column c for a given label 'a', as follows:
average_c = df.groupby(['a'])['c'].mean()
I want to add a new column 'd' which takes the difference between the value in column c and the average for the label it belongs to.
I.e.:
a b c d
0 1 2 3 0
1 1 2 3 0
2 2 1 1 0.5
3 2 0 0 -0.5
I can construct arrays and then add the column using iteration, but my intuition tells me there is a way to do this in a more sophisticated fashion.
I duplicate the column with
df['d'] = df['c']
and I assume I need to include some operation here like -average_c['a']
but I'm a bit lost at this point.