0

I have a DataFrame like this:

    df = pd.DataFrame(columns=['count', 'color'])

For each row that has count > 0, I want to assign 'red' to color if

    np.random.binomial(1,prob)==1

I know how to do it with a for loop. I also know that, if there wasn't this condition, I could assign the red color without a for loop, in this way:

    df.loc[df['count']>0, ['color']]='red'

Is it possible to have both the filter on count and the condition on prob without the for loop?

David
  • 139
  • 1
  • 1
  • 12
  • Using a single trial in the binomial distribution doesn't make sense. You should just use a boolean for counts>0. – aerijman Dec 31 '20 at 18:00
  • My intention is to assign color to red with a given prob only to rows with a count > 0. In the for loop, I run a trial for each of the rows with count > 0, and it seems to work. – Federico Mancini Dec 31 '20 at 18:17

2 Answers2

1
df.loc[lambda x: (x['count'] > 0) & (np.random.binomial(1, prob)==1), 'color'] = 'red'

pandas.DataFrame.loc

Pandas: Conditionally replace values based on other columns values

manju-dev
  • 434
  • 2
  • 9
1

You can do this:

reds = np.where(d['counts']>1)[0]            # indices of red elements
probs = np.random.binomial(1,0.2, len(reds)) # probs to be assigned to red elements

# assign
d.loc[reds, "color"] = ['red' if i==1 else 'blue' for i in probs]

I am assuming data like the one generated here.

a = np.random.randint(0,10,100)
b = ['blue']*100
d = pd.DataFrame(np.vstack([a,b]).T, columns=['counts','color'])
d.loc[:,'counts'] = d.loc[:,'counts'].astype(int)
aerijman
  • 2,522
  • 1
  • 22
  • 32