What does the derivative of class score respect to feature map represent for?

Question

I'm learning about XAI and I have a question about the derivative of the network. Assume I have a CNN model which gives 4 output representing 4 classes, and I have one target layer (L) from which I want to extract information when I pass the image through model. When I take the derivative of 1 output respect to L, I get a gradient matrix which has the same shape as the feature map. So what does that matrix represent for? Ex: Feature map at L has shape [256, 40, 40] so does the gradient matrix.

model(I) ---> [p1, p2, p3, p4]
p4.backward()

score 0 · Answer 1 · answered Jan 31 '23 at 06:15

The gradient with respect to the feature map tells you, how (or where) changes to the feature map have biggest impact on the output, i.e., the prediction. For example. if you have an image of class car, i.e., the image shows a car under blue sky, you would expect that features that are extracted from the blue sky have little impact (small gradient), while the area showing the car shows large gradients, since changing these pixels would change the output (with least effort).

In XAI methods such as GradCAM yield so called attribution maps that say what areas (of an input) are responsible for an output. Gradients are sometimes multiplied with activations to obtain a "better" relevance score. It is even possible that networks learn from such gradients of feature maps though this is a non-standard procedure (See "Reflective-net: Learning from explanations", by Schneider et al).

What does the derivative of class score respect to feature map represent for?

1 Answers1