[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

thanhdowork 69 views 18 slides Sep 02, 2024

Slide 1 of 18

About This Presentation

Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition

Size: 1.59 MB

Language: en

Added: Sep 02, 2024

Slides: 18 pages

Slide Content

Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-09-02 Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition Jianyang Xie et al. AAAI-2024 : The Thirty-Eighth AAAI Conference on Artificial Intelligence

OUTLINE MOTIVATION METHODOLOGY EXPERIMENT & RESULT CONCLUSION

MOTIVATION Human action recognition ( HAR) is an essential topic: computer vision and wide range of applications . based-on skeleton sensor. Traditional methods (CNN/RNN) or STGNN extracting handcrafted features from skeleton sequence. Overview and Limitation SOTA ST-GCN considered fixed graph. insufficient to capture changeable movements . Adaptive adjacency based: ignored the semantic information. insufficient to capture semantic properties of actions. Semantic-guided: explicit input encoding. Not flexible and cooperate when in deeper GCN. Challenges:

INTRODUCTION Propose temporal-causal SFD network (TC-SFDN) architecture to detect the forgeries at the frame, clip and action levels . a hierarchical GCN architecture to learn both low-level skeleton representations based on physical body connections. high-level action representations based on the temporal-causal graph for each action instance. Contribution Propose dynamic semantic-based graph neural convolutions network (DS-GCN) : encode the dynamical semantic information of joints and edges implicitly. joint/edge type was encoded with different transform functions, each of which represents a specific distribution A group of SSL tasks are designed to efficiently train TC-SFDN for multilevel SFD.

METHODOLOGY Problem Definition A skeleton data is constructed as spatial-temporal graph N body joints in T frames: . : spatial and temporal link. : joint coordinates as the node feature, d is dimension. Spatial graph: intra-body . Temporal graph: Same joints along consecutive frames . ST-GCN can be divided into using 1D temporal convolution: S-GCN (focus on) and T-GCN. Topology-Fixed Graph Convolution Network: Update the node representation by aggregating information from its neighborhood. Denotes adjacency three partition Output of S-GCN from input

METHODOLOGY Problem Definition Topology-Adaptive Graph Convolution Network : Adaptive matrix dynamically learned with self attention mechanism. Suppose with 2 two transformation functions, the correlation between 2 joints: Semantic-Guided Graph Convolution Network: input feature was refined by adding a one-hot vector of joint types Adaptive matrix S-GCN:

METHODOLOGY Main Architecture: DS-GCN

METHODOLOGY Dynamic Semantic-Based GCN Topology-adaptive GCN: Joint and edge types encoded dynamically. a directed graph G = (V, E, A, R, X), A and R denote the type mapping function for each node, edge: Semantic-based adaptive graph for node and edge:

METHODOLOGY Dynamic Semantic-Based GCN Node Type-Aware Adaptive Topology. projected into their individual feature space with a node type mapping function. Calculate according to the non-local mechanism. s and t as two nodes of different types, node-aware feature representation: Directed correction between node sand t along channel dimension:

METHODOLOGY Dynamic Semantic-Based GCN Edge Type-Aware Adaptive Topology. applying separate convolution kernel on the adaptive graph. Given three nodes s, t and u of different types, edge type-aware adaptive correlation: Edge type-aware topology can be represented s and t is the node type index, M is the number of types.

METHODOLOGY Dynamic Semantic-Based GCN Decomposed into three branches: The node-type aware branch, edge-type aware branch, and general branch . A branch-wise weight : l earnable and utilized for the combination of a shared correction matrix. For each branch, combination of a shared correction matrix and a self-adaptive graph was utilized for spatial graph convolution operation. 3 branches were concatenated along feature channel dimension and followed by a 1 × 1 convolution kernel. Process DS-GCN:

METHODOLOGY Model Architecture Ten blocks in series: Followed by a global average pooling and a softmax classifier . Number of basic feature channels is 64 and doubled at 5 th and 8 th block. Each block: 1 DS-GCN and multi-scale temporal module (temporal convolution network).

EXPERIMENT AND RESULT Experiment Settings Dataset: human action recognition NTU-RGB+D and Kinetics-400 . Baselines: STGNN or GNN: ST-GCN [1], SGN [2], AS-GCN [3] , RA-GCN[4] , 2s-GCN[5], GCNN[6], FGCN[7], shiftGCN [8], DSTA-Net[9], MS-G3D[10], CTR-GCN[11] and ST-GCN++ [ 12 ] . CNN: PoseConv3D[13]. [1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1). [2] Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., & Zheng, N. (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1112-1121). [3] Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., & Tian, Q. (2019). Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3595-3603). [4] Song, Y. F., Zhang, Z., Shan, C., & Wang, L. (2020). Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925. [5] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12026-12035). [6] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7912-7921). [7] Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., & Maybank, S. J. (2021). Feedback graph convolutional network for skeleton-based action recognition. IEEE Transactions on Image Processing, 31, 164-175. [8] Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., & Lu, H. (2020). Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 183-192). [9] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2020). Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In Proceedings of the Asian conference on computer vision. [10] Liu, Z., Zhang, H., Chen, Z., Wang, Z., & Ouyang, W. (2020). Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 143-152). [11] Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., & Hu, W. (2021). Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13359-13368). [12] Duan, H., Wang, J., Chen, K., & Lin, D. (2022, October). Pyskl : Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 7351-7354). [13] Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2969-2978). Measurement : Accuracy (ACC).

EXPERIMENT AND RESULT Result – Overall Perfor mance Tab . Classification accuracy comparison against state-of-the-art methods.

EXPERIMENT AND RESULT R esult – Ablation study. Tab. Generalization of the proposed semantic module. Tab. Ablation On the edge/node type encoding. Tab. Comparison DS-GCN in different learnable weight manners. Tab. Exploration on the semantic encoding stage.

CONCLUSION P ropose 2 dynamical semantic-based adaptive graph : N ode type-aware and edge type-aware adaptive graph. Can be apply to any ST-GCN models for skeleton-based recognition . Summarization Generated a dynamic semantic-based graph neural network for skeleton-based human action recognition: outperforms SOTA methods notably on both NTURGB+D and Kinetics-400.

[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx