[NS][Lab_Seminar_240622]Vision HGNN: An Image is More than a Graph of Nodes.pptx

thanhdowork 95 views 14 slides Jun 28, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

Vision HGNN: An Image is More than a Graph of Nodes


Slide Content

Vision HGNN: An Image is More than a Graph of Nodes Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/22 Yan Han et al. ICCV 2023

Introduction Advances in image representation using Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Graph Neural Networks (GNNs) Limitation: Existing models often capture only simple relationships (e.g., pairwise in graphs), leading to inefficiencies and limitations in representing complex interactions Figure 1. The illustration of image topologies modeled in difference visual backbones. (a) CNNs treat images as regular grids, (b) ViT parses images as full-connected graphs, (c) ViGs process images as sparse graphs with pairwise edges, while (d) our ViHGNN models images as a hypergraph, a “more universal” structure

Vision Graph Neural Network (ViG) Concept: Treats image patches as nodes in a graph, connecting nearest neighbors Issue: Excessive and redundant edges due to pairwise relationships, leading to high computational costs and inefficiencies

ViHGNN Innovation: Uses hypergraphs to capture higher-order relationships among image patches Hypergraph Advantage: Hyperedges can connect multiple nodes, representing complex interactions beyond pairwise connections

Overview Figure 2. Overview of our ViHGNN model. HGNN and FNN represent hypergraph neural network and feed-forward network, respectively

Node and Hyperedge Construction Let I denote an image of size HxW, divide each image into N patches, after transforming each patch into a feature vector, obtain a data matrix where column corresponds to feature vector of a patch Nodes: Image patches, transformed into feature vectors Hyperedges: Formed using Fuzzy C-Means clustering Process: Clusters patches into overlapping groups Benefit: Captures more complex and nuanced relationships among patches

Overview

Message Passing in ViHGNN Two-Stage Process: Node-to-Hyperedge Aggregation: Aggregates information from nodes to hyperedges Hyperedge-to-Node Aggregation: Propagates aggregated hyperedge information back to nodes Formulation: hyperedge weight matrix output activation input learnable parameter Difference is MP between vanilla GNN and HGNNs: first run a simple GNN on clique expansion of hypergraph, messages in HGNN will be received and transformed on both node and hyperedge sides

Adaptive Hypergraph Learning Dynamic Updates: Hypergraph structure and patch embeddings are iteratively updated Feedback Loop: Embeddings and hypergraph structure reinforce each other through continuous refinement Dynamic updates alternate between the two, enhancing structure-aware image representations Introduce fixed number of hyperedges as regularizer further prevents trivial structures in hypergraph, enhancing its effectiveness

Image Classification Dataset: ImageNet Metrics: Top-1 and Top-5 accuracy

Image Classification Dataset: ImageNet Metrics: Top-1 and Top-5 accuracy

Object Detection Dataset: COCO 2017 Task: Object detection using RetinaNet and Mask R-CNN frameworks

Ablation Study

Ablation Study