Databricks Koalas Column Assignment Based on Another COlumn Value Lambda Function

Question

Given a koalas Dataframe:

df = ks.DataFrame({"high_risk": [0, 1, 0, 1, 1], 
                   "medium_risk": [1, 0, 0, 0, 0]
                   })

Running a lambda function to get a new column based on the existing column values:

df = df.assign(risk=lambda x: "High" if x.high_risk else ("Medium" if x.medium_risk else "Low"))
df
Out[72]: 
   high_risk  medium_risk  risk
0          0            1  High
4          1            0  High
1          1            0  High
2          0            0  High
3          1            0  High

Expected return:

       high_risk  medium_risk  risk
    0          0            1  Medium
    4          1            0  High
    1          1            0  High
    2          0            0  Low
    3          1            0  High

Why does this assign "High" to each of the values. The intent is to operations on each row, is it looking at the whole column in the comparison?

is it mandatory to use `assign` as it seems complicated to use it for now the way you want? I think about a work around but not sure about the computational cost — Ben.T, Oct 11 '19 at 14:34
Not mandatory, however, my understanding is that koalas does not support: df["risk"] = df[] for column assignment. — ratchet, Oct 11 '19 at 15:53

score 1 · Accepted Answer · answered Oct 11 '19 at 15:57

Using assign on a koalas df seems not easy to me, but for your case, I would mul the column 'high_risk' by 2 then add the column 'medium_risk' and finally map the result to replace the 2 by 'high' (because you multiply the column by 2 before) 1 by 'medium' and 0 by 'low' such as:

df = df.assign(risk= df.high_risk.mul(2).add(df.medium_risk)
                       .map({0:'low', 1:'medium', 2:'high'}))
df
   high_risk  medium_risk    risk
0          0            1  medium
1          1            0    high
2          0            0     low
3          1            0    high
4          1            0    high

Note : this would fail if you have 1 in both high and medium risks column.

score 0 · Answer 2 · answered Mar 14 '23 at 05:30

def function1(ss:ks.Series):
    if ss.high_risk==1:
        return "High"
    elif ss.medium_risk==1:
        return "Medium"
    else:
        return "Low"

col1=df.apply(function1,axis=1)
df.join(col1.rename("risk"))

out：

       high_risk  medium_risk  risk
    0          0            1  Medium
    4          1            0  High
    1          1            0  High
    2          0            0  Low
    3          1            0  High

Databricks Koalas Column Assignment Based on Another COlumn Value Lambda Function

2 Answers2