[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx
thanhdowork
66 views
20 slides
Sep 24, 2024
Slide 1 of 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
About This Presentation
Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition
Size: 2.34 MB
Language: en
Added: Sep 24, 2024
Slides: 20 pages
Slide Content
Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/09/23 Shasha Zhu et al. Neurocomputing 2024
Introduction Overview of Skeleton-based Action Recognition Core task in video understanding, used in human-computer interaction, health monitoring Skeleton sequences: high information density, low redundancy, clear structure Problem Statement Existing methods fail to utilize precise high-level semantic action descriptions Objective Propose a Prompt-supervised Dynamic Attention Graph Convolutional Network (PDA-GCN) to improve accuracy in recognizing human actions
Motivation & Challenges Complexity in human actions: similar action manifestations can have different semantics Traditional methods: CNNs RNNs GCNs fail to capture both global and local relationship effectively
Proposed model Prompt Supervision (PS) module: Use pre-trained language models (LLMs) as knowledge engines Dynamic Attention Graph Convolution (DA-GC) module: Self-attention mechanism for capturing relationships between joints Dynamic convolution focus on local details, improving model accuracy
Model Main branch: Encoder: process skeleton sequence data and extract joint relationships Spatial modeling: DA-GC block for context-sensitive topology extraction Temporal modeling: multi-scale temporal convolution for skeleton sequences over time Supervised branch: Prompt supervision: use pre-stored text features from LLMs to refine classification
Key Innovation Dynamic Attention Graph Convolution (DA-GC): Combine standard and dynamic convolution for local and global feature integration Prompt Supervision (PS): Enhance model’s learning by introducing LLM-based action descriptions, improving discriminative power with minimal computation cost
Model Fig. 1. Architecture Overview of PDA-GCN. where represents the splicing operation, represents element multiplication, PE and GAP represent position embedding and global average pooling, respectively. The CTR-GC block and MS-TC block are shown in the green dotted box at the top of the figure, and the DA-GC block and PS block will be described in detail later
Model Input data is first pre-processed to convert the input skeleton sequence into an initial joint representation Supervision loss Overall loss
Model Dynamic attention graph convolution module Fig. 2. Overview of the DA-GC module. where ⊕ and ⊗ denotes the splicing operation and element product, DConv is a dynamic convolution, A is the predefined topology, BN is a group normalization, and ReLU is a activation function.
Model Dynamic attention graph convolution module Attention graph A’: Dynamic topology Reset to:
Model Dynamic attention graph convolution module Dynamic convolution to enhance local context information is proposed Attention weight
Model Fig. 3. Overview of the dynamic convolution. Where, DWConv is a depthwise convolve
Model Prompt supervision module Fig. 4. Overview of PS module. N is the number of joint nodes, C is the number of current channels, cls is the number of action categories and GAP is the global average pooling
Experiments Experimental Settings Datasets: NTU RGB+D 60 and 120 (common for skeleton-based action recognition) Metrics: Cross-Subject and Cross-View used to measure model performance
Experiments Results
Experiments Results
Experiments Results
Experiments Results
Experiments Results
Conclusion PDA-GCN provide a robust, efficient solution for skeleton-based action recognition Combine dynamic attention and prompt supervision for superior accuracy Extend the model to larger datasets and explore further integration with pre-trained models for human-centric tasks