[NS][Lab_Seminar_240622]Vision HGNN: An Image is More than a Graph of Nodes.pptx
thanhdowork
95 views
14 slides
Jun 28, 2024
Slide 1 of 14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
About This Presentation
Vision HGNN: An Image is More than a Graph of Nodes
Size: 1.47 MB
Language: en
Added: Jun 28, 2024
Slides: 14 pages
Slide Content
Vision HGNN: An Image is More than a Graph of Nodes Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/22 Yan Han et al. ICCV 2023
Introduction Advances in image representation using Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Graph Neural Networks (GNNs) Limitation: Existing models often capture only simple relationships (e.g., pairwise in graphs), leading to inefficiencies and limitations in representing complex interactions Figure 1. The illustration of image topologies modeled in difference visual backbones. (a) CNNs treat images as regular grids, (b) ViT parses images as full-connected graphs, (c) ViGs process images as sparse graphs with pairwise edges, while (d) our ViHGNN models images as a hypergraph, a “more universal” structure
Vision Graph Neural Network (ViG) Concept: Treats image patches as nodes in a graph, connecting nearest neighbors Issue: Excessive and redundant edges due to pairwise relationships, leading to high computational costs and inefficiencies
ViHGNN Innovation: Uses hypergraphs to capture higher-order relationships among image patches Hypergraph Advantage: Hyperedges can connect multiple nodes, representing complex interactions beyond pairwise connections
Overview Figure 2. Overview of our ViHGNN model. HGNN and FNN represent hypergraph neural network and feed-forward network, respectively
Node and Hyperedge Construction Let I denote an image of size HxW, divide each image into N patches, after transforming each patch into a feature vector, obtain a data matrix where column corresponds to feature vector of a patch Nodes: Image patches, transformed into feature vectors Hyperedges: Formed using Fuzzy C-Means clustering Process: Clusters patches into overlapping groups Benefit: Captures more complex and nuanced relationships among patches
Overview
Message Passing in ViHGNN Two-Stage Process: Node-to-Hyperedge Aggregation: Aggregates information from nodes to hyperedges Hyperedge-to-Node Aggregation: Propagates aggregated hyperedge information back to nodes Formulation: hyperedge weight matrix output activation input learnable parameter Difference is MP between vanilla GNN and HGNNs: first run a simple GNN on clique expansion of hypergraph, messages in HGNN will be received and transformed on both node and hyperedge sides
Adaptive Hypergraph Learning Dynamic Updates: Hypergraph structure and patch embeddings are iteratively updated Feedback Loop: Embeddings and hypergraph structure reinforce each other through continuous refinement Dynamic updates alternate between the two, enhancing structure-aware image representations Introduce fixed number of hyperedges as regularizer further prevents trivial structures in hypergraph, enhancing its effectiveness
Image Classification Dataset: ImageNet Metrics: Top-1 and Top-5 accuracy
Image Classification Dataset: ImageNet Metrics: Top-1 and Top-5 accuracy
Object Detection Dataset: COCO 2017 Task: Object detection using RetinaNet and Mask R-CNN frameworks