[NS][Lab_Seminar_240608]Cascade Graph Neural Network for RGB-D Salient Object Detection.pptx

thanhdowork 66 views 17 slides Jul 04, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Cascade Graph Neural Network for RGB-D Salient Object Detection


Slide Content

Cascade Graph Neural Network for RGB-D Salient Object Detection Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/06/08 Ao Luo et al. ECCV 2020

Introduction RGB-D: combination of RGB (color) and depth information Downstream task: salient object detection which identify the most noticeable objects in a scene Challenges: complexity due to varying object sizes, shapes and depth cues

Motivation Why Cascade Graph Neural Networks? Need for Enhance Context Understanding: difficulty in capturing spatial relationships with traditions methods GNNs: effective in modeling structured relationships Cascade approach: incrementally refines detection results, improving accuracy

Methodology Cascade GNN Objective: enhance RGB-D salient object detection using CGNN Approach: Multi-stage cascade refinement Graph-based reasoning for spatial relationships Integration of RGB and Depth information

Backbone Network Role: Extract initial feature maps from RGB-D images Common Backbones: ResNet, VGG Features: Captures low-level to high-level features

Graph Construction

Graph Construction Purpose: Represent image regions and their relationships Nodes: Correspond to image regions Appearance node (RGB-related features) Geometry node (depth features) Guidance node: fixed during message passing, deliver the guidance information Edges: Capture spatial relationships Multi-scale node embeddings: Given 2D appearance features C, 3D appearance features D, use pyramid pooling module followed by convolution layer and interpolation layer to extract multi-scale features of 2 modalities as the initial node representations Edge embeddings: Edges link the nodes from same modality but different scales, and opposite way

Graph Construction Message passing: Node-state updating: After message passing step, each node aggregate information from its neighboring nodes to update its original feature representations, using GRU

Graph Reasoning Module Graph Convolutional Layers: Used to propagate features across the graph Refinement: Enhances features by considering the graph structure

Cascade Refinement Stages: Each stage refines the results of the previous stage Focus: Harder cases are progressively improved

Multi-modal Fusion Integration: Combines RGB and Depth features Method: Fusion at different stages to leverage complementary information

Results Datasets NJUD STEREO NLPR LFSD RGBD135 SSD Evaluation metrics MAE PR Curve F-measure S-measure E-measure

Results Implementation Utilize 2 VGG-16 as backbones, one for extracting 2D appearance (RGB) features and other for extracting 3D geometric (depth) features Employ the dilated convolutions to ensure last 2 groups of backbones have same resolution Graph-based Reasoning module, 3 nodes are used in each modality for capturing information of multiple scales, resulting in a graph with 6 nodes in total G links all nodes of the same modality For node of different modalities, edge only connects those nodes with the same scale CPR module, the features from outputs of the second, third and fifth group of each backbone (different resolutions) are used as inputs for performing cascade graph reasoning

Results Ablation Analysis

Results SOTA comparison

Results

Conclusion Propose novel deep model based on graph-based techniques for RGB-D salient object detection Propose to use cascade structure to enhance GNN model to make it better take advantages of rich, complementary information from multi-level features CAS-GNN distills useful information from the 2D (color) appearance and 3D geometry (depth) information