SHAP values interpretation for multi-class

Question

I'm looking at the Iris Dataset where I have calculated SHAP_values for my X_test dataset, I have provided the first five of each array as example:

    [
        array([[-0.02994951, -0.00631915, -0.11904487, -0.13368648],
               [-0.00344951,  0.06718085,  0.24445513,  0.40281352],
               [-0.02701866, -0.00925   , -0.084     , -0.16873134],
               [-0.02994951, -0.00631915, -0.11904487, -0.13368648],
               [-0.03526866, -0.001     , -0.11904487, -0.13368648]]), 
    
        array([[ 0.02296024,  0.0191085 ,  0.27049242,  0.31693884],
               [ 0.02209713, -0.0431662 , -0.12745271, -0.20947822],
               [-0.0270254 , -0.0025275 , -0.10235476, -0.22609234],
               [ 0.03241468,  0.04532274,  0.25367799,  0.29808459],
               [ 0.04827892, -0.00105323,  0.13303134,  0.36174298]]), 
        
        array([[ 0.00698927, -0.01278935, -0.15144755, -0.18325236],
               [-0.01864762, -0.02401465, -0.11700243, -0.1933353 ],
               [ 0.05404406,  0.0117775 ,  0.18635476,  0.39482368],
               [-0.00246517, -0.03900359, -0.13463313, -0.16439811],
               [-0.01301026,  0.00205323, -0.01398647, -0.2280565 ]])
    ]

I have the following expected values:

EV = [0.289 0.358 0.353]

As an example, for the first row in array 0, I have added the expected value to the sum of row 0 in array 0, and then I can see if the contribution for a given sample adds to 0 or 1.

sv_0_sum = sv_0.iloc[0, :] # -0.28900000000000003
print(sv_0_sum.sum() + explainer.expected_value[0])

Which results in this case in 0. I think this makes sense, but with binary classification, SHAP values will result in a 2 arrays where the values are reflected. To provide an example, given some arbitrary for one sample in some dataset would be:

shap_values[0] = [0.013, 0.423, 0.245, -0.0123]

and

shap_values[1] = [-0.013, -0.423, -0.245, 0.0123]

but how does this concept work with multi-class? In the three arrays provided for Iris, I don't get this reflection, so how can I understand the SHAP values output for multi-class situations?

Sergey Bushmanov · Accepted Answer · 2022-05-10T19:12:05.067

0

Let's try reproducible:

from lightgbm import LGBMClassifier
from shap.datasets import iris
from shap import Explainer, Explanation
from shap import waterfall_plot
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(*iris(), random_state=42)
model = LGBMClassifier().fit(X_train, y_train)
explainer = Explainer(model)
sv = np.array(explainer.shap_values(X_test))  # <-- SHAP values
ev = np.array(explainer.expected_value)       # <-- base values

What you get here is base values per class:

ev

array([-3.5596808 , -1.08989253, -3.20879218])

on top of which you'll add shap values. Note the shapes:

ev.shape, sv.shape

((3,), (3, 38, 4))

where:

3 corresponds to the number of classes
38 to the number of cases
4 are features, for which you're interested to find SHAP values.

Then, you might be interested how to explain a particular prediction:

idx = 0
model.predict(X_test.iloc[[idx]], raw_score=True)

array([[-8.10103813,  1.50946338, -3.6985032 ]])

You'll get the same preds if add SHAP values per the data point of interest to the base values:

ev + sv[:, idx, :].sum(-1)

array([-8.10103813,  1.50946338, -3.6985032 ])

Or visually:

waterfall_plot(Explanation(sv[0][idx], ev[idx], feature_names=X_train.columns))

where 0 stands for the class prediction of interest.

PS

Most probably you'll get slightly different results due to randomness built into LGBM classifier.

edited May 10 '22 at 19:12

answered May 10 '22 at 18:47

Sergey Bushmanov

23,310
7
53
72

Thanks - few questions.. What is this raw_score parameter? I don't really find it in sklearn, I just tried searching for it. `model.predict(X_test.iloc[[idx]], raw_score=True)` – Plewis May 10 '22 at 19:14
Boosted trees are build in raw space or logits, as they call it in ANN. Then raw scores are converted to probability space via sigmoid for binary classification or softmax for multiclass. You may try importing softmax from scipy.special and see that softmax will be indeed equal to predict_proba. – Sergey Bushmanov May 10 '22 at 19:17
Aha that is pretty cool. My follow up question is that I still don't quite understand how to interpret the three different arrays that the shap values return. I understand the concept with adding a given point sv to the ev, but the same point will have different SHAP values dependent on the sv[0], sv[1] or sv[2] is used. I suppose for classes >=3 it does not work in the same way as with binary clf where the sv[0] and sv[1] reflect eachother? – Plewis May 10 '22 at 19:23
You add up sv per class to the class' base value – Sergey Bushmanov May 10 '22 at 19:26
Right, I that makes sense, but what about this interpretation: If I have a data point with where the predicted class is 1, then I would take the EV for class 1 and add to the sum of the sv[1] and get 1. Is it then possible to state that if many classes exist, then the SHAP values for a point, will become smaller and smaller for all other arrays of sv? So, given point A, with prediction to class 2, the SHAP values should be somewhat high for this sv[2] at index for point A, but if we have 4 classes the remainding sv for class 1,3,4 will all get lower and lower the more classes are added? – Plewis May 10 '22 at 20:19
SHAP values are meant to add up to real predictions. If you have 2,3, or any number of cases -- it doesn't matter. They are what you call "reflecred" for binary classification for a good reason:what moves prediction towards class 1, moves prediction towards class 0 by exactly the same amount. If you consider multiclass from a similar perspective, and calculate as OVR (one-over-the-rest), yes (1) contribution for one class and all the others should have opposite sign (2) the contributions for all the others should be scaled reciprocally. – Sergey Bushmanov May 11 '22 at 03:25

SHAP values interpretation for multi-class

1 Answers1