[20240902_LabSeminar_Huy]Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition.pptx

thanhdowork 69 views 18 slides Sep 02, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition


Slide Content

Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-09-02 Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition Jianyang Xie et al. AAAI-2024 : The Thirty-Eighth AAAI Conference on Artificial Intelligence

OUTLINE MOTIVATION METHODOLOGY EXPERIMENT & RESULT CONCLUSION

MOTIVATION Human action recognition ( HAR) is an essential topic: computer vision and wide range of applications . based-on skeleton sensor. Traditional methods (CNN/RNN) or STGNN extracting handcrafted features from skeleton sequence. Overview and Limitation SOTA ST-GCN considered fixed graph. insufficient to capture changeable movements . Adaptive adjacency based: ignored the semantic information. insufficient to capture semantic properties of actions. Semantic-guided: explicit input encoding. Not flexible and cooperate when in deeper GCN. Challenges:

INTRODUCTION Propose temporal-causal SFD network (TC-SFDN) architecture to detect the forgeries at the frame, clip and action levels . a hierarchical GCN architecture to learn both low-level skeleton representations based on physical body connections. high-level action representations based on the temporal-causal graph for each action instance. Contribution Propose dynamic semantic-based graph neural convolutions network (DS-GCN) : encode the dynamical semantic information of joints and edges implicitly. joint/edge type was encoded with different transform functions, each of which represents a specific distribution A group of SSL tasks are designed to efficiently train TC-SFDN for multilevel SFD.

METHODOLOGY Problem Definition A skeleton data is constructed as spatial-temporal graph N body joints in T frames: . : spatial and temporal link. : joint coordinates as the node feature, d is dimension. Spatial graph: intra-body . Temporal graph: Same joints along consecutive frames . ST-GCN can be divided into using 1D temporal convolution: S-GCN (focus on) and T-GCN. Topology-Fixed Graph Convolution Network: Update the node representation by aggregating information from its neighborhood. Denotes adjacency three partition Output of S-GCN from input  

METHODOLOGY Problem Definition Topology-Adaptive Graph Convolution Network : Adaptive matrix dynamically learned with self attention mechanism. Suppose with 2 two transformation functions, the correlation between 2 joints: Semantic-Guided Graph Convolution Network: input feature was refined by adding a one-hot vector of joint types Adaptive matrix S-GCN:

METHODOLOGY Main Architecture: DS-GCN

METHODOLOGY Dynamic Semantic-Based GCN Topology-adaptive GCN: Joint and edge types encoded dynamically. a directed graph G = (V, E, A, R, X), A and R denote the type mapping function for each node, edge: Semantic-based adaptive graph for node and edge:

METHODOLOGY Dynamic Semantic-Based GCN Node Type-Aware Adaptive Topology. projected into their individual feature space with a node type mapping function. Calculate according to the non-local mechanism. s and t as two nodes of different types, node-aware feature representation: Directed correction between node sand t along channel dimension:

METHODOLOGY Dynamic Semantic-Based GCN Edge Type-Aware Adaptive Topology. applying separate convolution kernel on the adaptive graph. Given three nodes s, t and u of different types, edge type-aware adaptive correlation:   Edge type-aware topology can be represented s and t is the node type index, M is the number of types.

METHODOLOGY Dynamic Semantic-Based GCN Decomposed into three branches: The node-type aware branch, edge-type aware branch, and general branch . A branch-wise weight : l earnable and utilized for the combination of a shared correction matrix. For each branch, combination of a shared correction matrix and a self-adaptive graph was utilized for spatial graph convolution operation. 3 branches were concatenated along feature channel dimension and followed by a 1 × 1 convolution kernel. Process DS-GCN:

METHODOLOGY Model Architecture Ten blocks in series: Followed by a global average pooling and a softmax classifier . Number of basic feature channels is 64 and doubled at 5 th and 8 th block. Each block: 1 DS-GCN and multi-scale temporal module (temporal convolution network).

EXPERIMENT AND RESULT Experiment Settings Dataset: human action recognition NTU-RGB+D and Kinetics-400 . Baselines: STGNN or GNN: ST-GCN [1], SGN [2], AS-GCN [3] , RA-GCN[4] , 2s-GCN[5], GCNN[6], FGCN[7], shiftGCN [8], DSTA-Net[9], MS-G3D[10], CTR-GCN[11] and ST-GCN++ [ 12 ] . CNN: PoseConv3D[13]. [1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1). [2] Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., & Zheng, N. (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1112-1121). [3] Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., & Tian, Q. (2019). Actional-structural graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3595-3603). [4] Song, Y. F., Zhang, Z., Shan, C., & Wang, L. (2020). Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925. [5] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12026-12035). [6] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7912-7921). [7] Yang, H., Yan, D., Zhang, L., Sun, Y., Li, D., & Maybank, S. J. (2021). Feedback graph convolutional network for skeleton-based action recognition. IEEE Transactions on Image Processing, 31, 164-175. [8] Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., & Lu, H. (2020). Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 183-192). [9] Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2020). Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition. In Proceedings of the Asian conference on computer vision. [10] Liu, Z., Zhang, H., Chen, Z., Wang, Z., & Ouyang, W. (2020). Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 143-152). [11] Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., & Hu, W. (2021). Channel-wise topology refinement graph convolution for skeleton-based action recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13359-13368). [12] Duan, H., Wang, J., Chen, K., & Lin, D. (2022, October). Pyskl : Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 7351-7354). [13] Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2969-2978). Measurement : Accuracy (ACC).

EXPERIMENT AND RESULT Result – Overall Perfor mance Tab . Classification accuracy comparison against state-of-the-art methods.

EXPERIMENT AND RESULT R esult – Ablation study. Tab. Generalization of the proposed semantic module. Tab. Ablation On the edge/node type encoding. Tab. Comparison DS-GCN in different learnable weight manners. Tab. Exploration on the semantic encoding stage.

CONCLUSION P ropose 2 dynamical semantic-based adaptive graph : N ode type-aware and edge type-aware adaptive graph. Can be apply to any ST-GCN models for skeleton-based recognition . Summarization Generated a dynamic semantic-based graph neural network for skeleton-based human action recognition: outperforms SOTA methods notably on both NTURGB+D and Kinetics-400.