[NS][Lab_Seminar_240923]Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition.pptx

thanhdowork 66 views 20 slides Sep 24, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition


Slide Content

Prompt-supervised Dynamic Attention Graph Convolutional Network for Skeleton-based Action Recognition Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: os fa19730 @catholic.ac.kr 202 4/09/23 Shasha Zhu et al. Neurocomputing 2024

Introduction Overview of Skeleton-based Action Recognition Core task in video understanding, used in human-computer interaction, health monitoring Skeleton sequences: high information density, low redundancy, clear structure Problem Statement Existing methods fail to utilize precise high-level semantic action descriptions Objective Propose a Prompt-supervised Dynamic Attention Graph Convolutional Network (PDA-GCN) to improve accuracy in recognizing human actions

Motivation & Challenges Complexity in human actions: similar action manifestations can have different semantics Traditional methods: CNNs RNNs GCNs fail to capture both global and local relationship effectively

Proposed model Prompt Supervision (PS) module: Use pre-trained language models (LLMs) as knowledge engines Dynamic Attention Graph Convolution (DA-GC) module: Self-attention mechanism for capturing relationships between joints Dynamic convolution focus on local details, improving model accuracy

Model Main branch: Encoder: process skeleton sequence data and extract joint relationships Spatial modeling: DA-GC block for context-sensitive topology extraction Temporal modeling: multi-scale temporal convolution for skeleton sequences over time Supervised branch: Prompt supervision: use pre-stored text features from LLMs to refine classification

Key Innovation Dynamic Attention Graph Convolution (DA-GC): Combine standard and dynamic convolution for local and global feature integration Prompt Supervision (PS): Enhance model’s learning by introducing LLM-based action descriptions, improving discriminative power with minimal computation cost

Model Fig. 1. Architecture Overview of PDA-GCN. where represents the splicing operation, represents element multiplication, PE and GAP represent position embedding and global average pooling, respectively. The CTR-GC block and MS-TC block are shown in the green dotted box at the top of the figure, and the DA-GC block and PS block will be described in detail later

Model Input data is first pre-processed to convert the input skeleton sequence into an initial joint representation Supervision loss Overall loss

Model Dynamic attention graph convolution module Fig. 2. Overview of the DA-GC module. where ⊕ and ⊗ denotes the splicing operation and element product, DConv is a dynamic convolution, A is the predefined topology, BN is a group normalization, and ReLU is a activation function.

Model Dynamic attention graph convolution module Attention graph A’: Dynamic topology Reset to:

Model Dynamic attention graph convolution module Dynamic convolution to enhance local context information is proposed Attention weight

Model Fig. 3. Overview of the dynamic convolution. Where, DWConv is a depthwise convolve

Model Prompt supervision module Fig. 4. Overview of PS module. N is the number of joint nodes, C is the number of current channels, cls is the number of action categories and GAP is the global average pooling

Experiments Experimental Settings Datasets: NTU RGB+D 60 and 120 (common for skeleton-based action recognition) Metrics: Cross-Subject and Cross-View used to measure model performance

Experiments Results

Experiments Results

Experiments Results

Experiments Results

Experiments Results

Conclusion PDA-GCN provide a robust, efficient solution for skeleton-based action recognition Combine dynamic attention and prompt supervision for superior accuracy Extend the model to larger datasets and explore further integration with pre-trained models for human-centric tasks