Introduction and comparison to CAM, Grad-CAM, Guided back propagation and Guided Grad-CAM
Size: 7.56 MB
Language: en
Added: Sep 19, 2021
Slides: 24 pages
Slide Content
Introduction to Grad-CAM Advisor: Henry Horng-Shing Lu Student s : Jane Hsing-Chuan Hsieh Date: 2021-08-10
Research context Concern: Model Transparency & Interpretability Despite unprecedented breakthroughs of CNN in a variety of computer vision tasks, their lack of decomposability into individually intuitive components makes them hard to interpret Purpose: Visualizing CNNs visualized CNN predictions by highlighting ‘important’ pixels (i.e. change in intensities of these pixels have the most impact on the prediction score) Help Users to Build Trust to AI we must build ‘transparent’ models that have the ability to explain why they predict what they predict.
Research context What makes a good visual explanation? Class Discriminative – localize the category in the image Class Activation Mapping (CAM) Gradient-weighted Class Activation Mapping (Grad-CAM) High-Resolution – capture fine-grained detail (Pixel-space gradient visualizations) Guided Back propagation Deconvolution Both – Guided Grad-CAM not class-discriminative
Outline Brief Introduction for CNN Visualizing Tools CAM Grad-CAM Guided back propagation Guided Grad-CAM
1. Introduction CAM Grad-CAM Guided back propagation Guided Grad-CAM
Preface Convolutional layers of CNNs actually behave as object detectors (i.e., to localize objects) despite no supervision on the location of the object is provided In other words, convolutional layers naturally retain spatial information E.g., for action classification, CNN is able to localize the discriminative regions as the objects that the humans are interacting with rather than the humans themselves
Preface However, this ability ( spatial information / object detectors ) is lost in fully-connected layers So we expect t he last convolutional layer have the most detailed spatial information The higher the convolutional layers are, the higher level of semantics are extracted
CAM F or a particular category ( ) , a Class Activation Map (CAM) indicates the discriminative image regions used by the CNN to identify that category Characteristics Replace fully-connected layers with global average pooling (GAP) layers to minimize the number of parameters while maintaining high performance act as structural regularizer , preventing overfitting during training
CAM CNN Architecture For each feature map ( ) at the last convolutional layer, GAP outputs the spatial average of each feature map For a given class , the input for output layer: ( : importance of for class ) Output score for class : (e.g., softmax )
CAM CAM Procedure Weights ( , , …, ) of output layer indicate the importance of the image regions ( ) to a specific class ( ) Compute CAM: Note: if the shape (H, W) of CAM ( is different from that of input images, up-sampling is needed to equalize the shapes
CAM: Properties CAM trades off model complexity and performance (using global average pooling (GAP) ) for more transparency Shortage To apply CAM, any CNN-based network must change its architecture, where GAP is a must before the output layer i.e., architectural changes and hence re-training is needed
Grad-CAM Gradient-weighted Class Activation Mapping (Grad-CAM) generalizes CAM for a wide variety of CNN-based architectures i.e., without requiring architectural changes or re-training Characteristics Without GAP layer, we need a way to define weights – Grad-CAM uses the gradients of any target concept ( ) (e.g., ‘dog’ in a classification network) flowing into the final convolutional layer , and derive summary statistics out of it to represent the weights ( importance ) Source: Selvaraju , Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." Proceedings of the IEEE international conference on computer vision . 2017.
Grad-CAM Procedure For a given class , compute the gradient of its score– (before the softmax ), w.r.t. each feature map activations of a convolutional layer, i.e. Define the importance weights of feature map via GAP: I nfluence of to
Grad-CAM Procedure Compute Grad-CAM: ReLU is applied because we are only interested in the features (neurons ) that have a positive influence on the class of interest i.e. pixels whose intensity should be increased in order to increase Note: if the shape (u, v) of is different from that of input images, up-sampling is needed to equalize the shapes
GRAD -CAM: Properties Grad-CAM generates visual explanations for a wide variety of CNN-based networks without requiring architectural changes or re-training.
GRAD -CAM: Properties Grad-CAM can help identify the biases in dataset Models trained on biased datasets may not generalize to real-world scenarios, or worse, may perpetuate biases and stereotypes ( w.r.t. gender, race, age, etc.) E.g., for a “doctor” vs. “nurse” binary classification task Biased model had learned to look at the person’s face hairstyle to distinguish nurses from doctors thus learning gender stereotype Unbiased model made the right prediction looking at the white coat, and the stethoscope
GRAD -CAM: Properties Shortage The generated localization map (heatmap) from Grad-CAM (also CAM) is coarse (low-resolution) unclear enough why the network predicts a particular instance (e.g., “tiger cat”) Guided Back Propagation is another approach to provide high-resolution map i.e. fine-grained detail, or pixel- space gradient visualizations
Guided Back Propagation: Another Approach Guided Backpropagation visualizes gradients of the network’s prediction (i.e., output neuron) w . r . t . the input image This determines which pixels need to be changed the least to affect the prediction the most (i.e., higher absolute gradients) N egative gradients are suppressed through ReLU when backpropagating because we are only interested in the pixels that increase the activation of the output neuron, rather than suppressing it
Guided Back Propagation: Properties Guided Back Propagation is high-resolution since it derive gradients directly w . r . t . the input image instead of w . r . t . last convolutional Layer (i.e., Grad-CAM) Shortage Not class-discriminative Guided Grad-CAM combines Guided backpropagation and Grad-CAM, and thus becomes class- discriminative
Guided Grad-CAM Characteristics Guided Grad-CAM is both high-resolution and class-discriminat ive Procedure Fusing Guided Back Propagation with Grad-CAM to create Guided Grad-CAM visualizations
Guided Grad-CAM: Properties Guided Grad-CAM also help untrained users successfully discern a ‘stronger’ network from a ‘weaker’ one , even when both make identical predictions. stronger network weaker network 2 models (A vs B) with same prediction accuracies
Thank for your attention
Guided Back Propagation
Guided Back Propagation: Properties because guided backpropagation adds an additional guidance signal from the higher layers to usual backpropagation. This prevents backward flow of negative gradients, corresponding to the neurons which decrease the activation of the higher layer unit we aim to visualize