240527_Thanh_LabSeminar[Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships].pptx
thanhdowork
73 views
33 slides
Jun 03, 2024
Slide 1 of 33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
About This Presentation
Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships
Size: 6.8 MB
Language: en
Added: Jun 03, 2024
Slides: 33 pages
Slide Content
Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/05/27 Abhra Chaudhuri et al. NeurIPS 2023
Introduction Problem: The paper addresses the issue of abstract relational representations in fine-grained representation learning. These representations, while effective, are not easily interpretable Background: Recent advances in fine-grained representation learning have achieved state-of-the-art results by leveraging local-to-global (emergent) relationships . However, the relational representations used by these methods are abstract and not easily interpretable Questions: The authors aim to deconstruct the abstraction of relational representations and express them as interpretable graphs over image views Approach: The authors theoretically show that abstract relational representations are a way of recovering transitive relationships among local views . Based on this, they design Transitivity Recovering Decompositions (TRD), a graph-space search algorithm that identifies interpretable equivalents of abstract emergent relationships at both instance and class levels, without any post-hoc computations Answers to questions: Yes, the results show that the authors were able to deconstruct the abstraction of relational representations and express them as interpretable graphs over image views Conclusion: The paper concludes that Transitivity Recovering Decompositions (TRD) is a robust and interpretable method for deconstructing the abstraction of relational representations in fine-grained representation learning
Introduction
Related Works Fine-grained visual categorization Localized image features Relationship between images and between network layers Leverage CNN feature map activations or identify discriminative sequences of parts: boosting, kernel pooling and channel masking Proposed method gets beyond the abstractions inherent in such approaches, presents a fully transparent pipeline by expressing all computations in terms of GRL at both the instance and the class-level
Related Works Relation modeling in DL Relationships between entities serve as an important source of semantic information as has been demonstrated in GRL Existing works leverage relational information for RL with local views can combine to form the global view through a transformer and distill out the information about the most optimal combination in the summary embedding or with underlying relations through a GNN and perform an aggregation on its outputs Both produce vector-valued outputs, which cannot be decoded in a straightforward way to get an understanding of what the underlying emergent relationships between views are This lack of transparency is what they called “abstract”
Related Works Explainability for Graph Neural Networks GNNExplainer was the first framework proposed explaining GNN predictions by identifying maximally informative subgraphs and subset of features that influence its predictions, by exploring subgraphs via MC tree search and identifying the most informative ones based on their Shapley values However all methods provided either local or global explanations => PGExplainer presented a parameterized approach to provide generalized, class-level explanations for GNNs [48] proposed a GNN explanation approach from a causal perspective , led to better generalization and faster inference [3] improved explanation robustness by modeling decision regions induced by similar graphs, while ensuring counterfactuality ViG presents an algorithm for representing images as a graph of its views, it exhibits an abstract computation pipeline lacking explainability
Transitivity Recovering Decompositions Consider an image x ∊ X with a categorical label y ∊ Y from an FGVC task g = c g (x) and L = {l 1 , l 2 , …, l k } = c l (x) be the global and set of local views of an image x V = {g} U L, where c g and c l are cropping functions applied on x f is semantically consistent, relation-agnostic encoder Representations of the global view g and local views L obtained from f are denoted by z g = f(g) and Z L = {f(l): l ∊ L} = {x l1 , x l2 ,... x lk } , denoted as Z V = { z g , x l1 ,x l2 ,...x lk } E is a function encodes the relationship r ∊ R n between global (g) and the set of local (L) views G is a set of graphs {V, E1, FV1, FE1},...} where nodes in each graph constitute of the view set V and edge set Ei P(V*V), P denotes the power set |G| = |P (V*V) | means that G is the set of all possible graph topologies with V as the set of nodes
Transitivity Recovering Decompositions Decomposing Relational Representations For a relational representations r that minimizes the information gap I(x;y| Z V ), exists a metric space (M,d) defines a Semantic Relevance Graph (Proxy / Concept Graph) Proxies can be represented by graphs The graph G* underlying r is a member of G with maximal label information Node embeddings in G* capture the relation-agnostic node embeddings, E is able to reduce further uncertainty about the label by joint observation of view pairs (edges in G*)
Transitivity Recovering Decompositions Transitivity Recovery (Emergence) Degree to which a set of local views (v 1 , …, v k ) contributes towards forming the global view, quantified as I( v 1 , …, v k ;g) (Transitivity) Local views v1, v2 and v3 are transitively related iff, when any 2 view pairs have a high contribution towards emergence, the third pair also has a high contribution where (i,j,k) ∊ {1,2,3} and ሃ is some threshold for emergence Function φ(v) is transitivity recovering if transitivity between v1, v2, v3 can, help quantify the transitivity Function φ(v) reduces the uncertainty I(v i v j ;g|φ(v i )φ(v j )) by being transitivity recovering Reduce the uncertainty about transitivity among the set of views is equivalent to reducing the uncertainty about the emergent relationships Transitivity is the key property in the graph space that filters out such degenerate subsets in L, identify view triplets where all the elements contribute towards emergence Relational information must be leveraged to learn a sufficient representation, a classifier operating in the graph space must be leverage transitivity to match instances and proxies
Transitivity Recovering Decompositions Generating Interpretable relational representations Core idea is to ensure relational transparency by expressing all computations leading to classification on a graph of image views, all the way up to class-proxy, by decomposing the latter into a concept graph Ensure the sufficiency of instance and proxy graphs through transitivity recovery by Hausdorff Edit Distance minimization
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Transitivity Recovering Decompositions Generating Interpretable relational representations (complementarity graph) Obtain relational-agnostic representations Z V from input image x by encoding each of its view v ∊ V as f(v) Obtain CG Gc ∊ G with node features F V = Z V , edge value = mutual information between the local view embeddings I(z i z j ) that edge connects Instantiate edge features F E as learnable n-dimensional vectors = 1/ |z i z j | Local views with low mutual information are complementary to each other, while ones with high MI have a lot of redundancy Inductive bias strengthen the connections among the complementary pairs suppresses the flow of redundant information during message passing, reducing the search space in G for finding G*
Transitivity Recovering Decompositions Generating Interpretable relational representations (Semantic Relevance Graph) Compute Gs by propagating Gc through GAT Node embedding obtained from GAT correspond the φ(z) φ is transitivity recovering by minimizing the Learnable Hausdorff Edit Distance between the instance and the proxy graphs
Transitivity Recovering Decompositions Generating Interpretable relational representations (Proxy/Concept Graph) Obtain proxy/concept graph for each class via an online clustering of the Semantic Relevance node and edge embeddings ( φ n (z), φ e (z) ) using Sinkhorn -Knopp algorithm Number of node clusters is |V| Number of edge clusters is |V|(|V|-1)/2 Set of class proxy graphs by P = {G P1 ,..., G Pk } where k is number of classes
Transitivity Recovering Decompositions Generating Interpretable relational representations (Inference and Learning objective) Recover end-to-end explainability need to avoid converting the instance and proxy graphs to vector-value embeddings Perform matching between Gs and Gp via graph kernel Choose Graph Edit Distance as kernel, learnable (L)-HED has 2 additional learnable costs for node insertions and deletions using GNN where ∝ = 1/(2|V|), h(.,.) as distance metric for Proxy Anchor Loss
Experimental Results Implementation Details For obtaining the global view, select the smallest bounding box containing the largest connected component of the thresholded final layer feature map obtained from an ImageNet-1K with pretrained ResNet50 and relation-agnostic encoder f Global view is resized to 224*224 Obtain 64 local views by randomly cropping 28*28 regions within the global crop and resizing them to 224*224 8-layer GAT with 4 attention heads in each hidden layer, and normalized via GraphNorm to obtain Semantic Relevance Graph
Experimental Results Datasets Small-scale: Soy and Cotton Cultivar datasets Medium-scale: FGVC Aircraft, Stanford Cars, CUB, and NA Birds Large-scale: iNaturalist dataset