I have a DataFrame with multiple columns where I want to assign some values, in a new column given a condition, to the entire given group identified by the column "id". The condition is: if "freq_cent" > 0.5, then the new column should have the value from the column 'Sentiment' in it for the given id. That is whether the freq_cent is greater than 0.5 or less than 0.5 for a given observation, the new column should contain the sentiment value that qualifies for greater than 0.5 for that entire group.
To create the sample DataFrame:
data = {'id': ['205', '205', '204', '204', '204'],
'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positive']}
df = DataFrame(data)
df['freq'] = df.groupby('Sentiment')['id'].transform(pd.Series.nunique)
# in order to get the correct frequencies, should add up the unique values of the sent_freq only.
df['freq_sum'] = df.groupby('id')['freq'].transform(pd.Series.count)
df['freq_cent'] = (df['freq']/df['freq_sum'])
df
Target Output:
id Sentiment freq freq_sum freq_cent new_column
0 205 Positive 2 2 1.000000 Positive
1 205 Positive 2 2 1.000000 Positive
2 204 Neutral 1 3 0.333333 Positive
3 204 Positive 2 3 0.666667 Positive
4 204 Positive 2 3 0.666667 Positive
I tried the following code:
df.groupby('id').apply(lambda x: x['new_col'] == x['Sentiment'] if (x['freq_cent'] > 0.5))
But, it doesn't work so far. I would be grateful for any suggestions. Thanks!