0

I have a DataFrame with multiple columns where I want to assign some values, in a new column given a condition, to the entire given group identified by the column "id". The condition is: if "freq_cent" > 0.5, then the new column should have the value from the column 'Sentiment' in it for the given id. That is whether the freq_cent is greater than 0.5 or less than 0.5 for a given observation, the new column should contain the sentiment value that qualifies for greater than 0.5 for that entire group.

To create the sample DataFrame:

data = {'id': ['205', '205', '204', '204', '204'], 
         'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positive']}
df = DataFrame(data)

df['freq'] = df.groupby('Sentiment')['id'].transform(pd.Series.nunique)
# in order to get the correct frequencies, should add up the unique values of the sent_freq only.
df['freq_sum'] = df.groupby('id')['freq'].transform(pd.Series.count)
df['freq_cent'] = (df['freq']/df['freq_sum'])
df

Target Output:

    id  Sentiment   freq    freq_sum    freq_cent new_column
0   205 Positive    2       2           1.000000  Positive
1   205 Positive    2       2           1.000000  Positive
2   204 Neutral     1       3           0.333333  Positive
3   204 Positive    2       3           0.666667  Positive
4   204 Positive    2       3           0.666667  Positive

I tried the following code:

df.groupby('id').apply(lambda x: x['new_col'] == x['Sentiment'] if (x['freq_cent'] > 0.5))

But, it doesn't work so far. I would be grateful for any suggestions. Thanks!

Abir
  • 57
  • 5

0 Answers0