[NS][Lab_Seminar_240608]Cascade Graph Neural Network for RGB-D Salient Object Detection.pptx
thanhdowork
66 views
17 slides
Jul 04, 2024
Slide 1 of 17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
About This Presentation
Cascade Graph Neural Network for RGB-D Salient Object Detection
Size: 1.45 MB
Language: en
Added: Jul 04, 2024
Slides: 17 pages
Slide Content
Cascade Graph Neural Network for RGB-D Salient Object Detection Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/08 Ao Luo et al. ECCV 2020
Introduction RGB-D: combination of RGB (color) and depth information Downstream task: salient object detection which identify the most noticeable objects in a scene Challenges: complexity due to varying object sizes, shapes and depth cues
Motivation Why Cascade Graph Neural Networks? Need for Enhance Context Understanding: difficulty in capturing spatial relationships with traditions methods GNNs: effective in modeling structured relationships Cascade approach: incrementally refines detection results, improving accuracy
Methodology Cascade GNN Objective: enhance RGB-D salient object detection using CGNN Approach: Multi-stage cascade refinement Graph-based reasoning for spatial relationships Integration of RGB and Depth information
Backbone Network Role: Extract initial feature maps from RGB-D images Common Backbones: ResNet, VGG Features: Captures low-level to high-level features
Graph Construction
Graph Construction Purpose: Represent image regions and their relationships Nodes: Correspond to image regions Appearance node (RGB-related features) Geometry node (depth features) Guidance node: fixed during message passing, deliver the guidance information Edges: Capture spatial relationships Multi-scale node embeddings: Given 2D appearance features C, 3D appearance features D, use pyramid pooling module followed by convolution layer and interpolation layer to extract multi-scale features of 2 modalities as the initial node representations Edge embeddings: Edges link the nodes from same modality but different scales, and opposite way
Graph Construction Message passing: Node-state updating: After message passing step, each node aggregate information from its neighboring nodes to update its original feature representations, using GRU
Graph Reasoning Module Graph Convolutional Layers: Used to propagate features across the graph Refinement: Enhances features by considering the graph structure
Cascade Refinement Stages: Each stage refines the results of the previous stage Focus: Harder cases are progressively improved
Multi-modal Fusion Integration: Combines RGB and Depth features Method: Fusion at different stages to leverage complementary information
Results Implementation Utilize 2 VGG-16 as backbones, one for extracting 2D appearance (RGB) features and other for extracting 3D geometric (depth) features Employ the dilated convolutions to ensure last 2 groups of backbones have same resolution Graph-based Reasoning module, 3 nodes are used in each modality for capturing information of multiple scales, resulting in a graph with 6 nodes in total G links all nodes of the same modality For node of different modalities, edge only connects those nodes with the same scale CPR module, the features from outputs of the second, third and fifth group of each backbone (different resolutions) are used as inputs for performing cascade graph reasoning
Results Ablation Analysis
Results SOTA comparison
Results
Conclusion Propose novel deep model based on graph-based techniques for RGB-D salient object detection Propose to use cascade structure to enhance GNN model to make it better take advantages of rich, complementary information from multi-level features CAS-GNN distills useful information from the 2D (color) appearance and 3D geometry (depth) information