I have a dataframe with 3 columns, in each row I have the probability that this row, the feature T has the value 1, 2 and 3
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame({"T1" : [0.8,0.5,0.01],"T2":[0.1,0.2,0.89],"T3":[0.1,0.3,0.1]})
For row 0, T is 1 with 80% chance, 2 with 10% and 3 with 10%
I want to simulate the value of T for each row and change the columns T1,T2, T3 to binary features. I have a solution but it needs to loop on the rows of the dataframe, it is really slow (my real dataframe has over 1 million rows) :
possib = df.columns
for i in range(df.shape[0]):
probas = df.iloc[i][possib].tolist()
choix_transp = np.random.choice(possib,1, p=probas)[0]
for pos in possib:
if pos==choix_transp:
df.iloc[i][pos] = 1
else:
df.iloc[i][pos] = 0
Is there a way to vectorize this code ?
Thank you !