[20240705_LabSeminar_Huy]Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection.pptx

thanhdowork 69 views 16 slides Jul 08, 2024

Slide 1 of 16

About This Presentation

Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection

Size: 1.4 MB

Language: en

Added: Jul 08, 2024

Slides: 16 pages

Slide Content

Quang-Huy Tran Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-07-05 Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection Zihan Wang et al. CVPR- 202 3 : The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023

OUTLINE MOTIVATION METHODOLOGY EXPERIMENT & RESULT CONCLUSION

MOTIVATION Human Facial Action Units ( Aus) play a significant role in human behavior un derstanding. annotated based on the anatomical characteristics of multiple facial muscle movement. AU detection is a challenging multi-label classification task AUs are subtle movement of facial muscles and different facial muscles have different ranges of movement ( person-specific factors: gender, age, etc; or context: background, illumination, etc) . Overview and Limitation Challenges: Previous works are not concerned temporal information. The annotations of AUs exhibit a notable imbalance which can result in the training of a biased model that are predisposed to learn AU patterns that have been annotated more frequently in the training set.

INTRODUCTION Propose a spatio -temporal facial AU graph representation learning framework : jointly model the spatio -temporal relationships among AUs of all face frames. Relationships between different AUs and the temporal information of a specific AU sequence can interact and jointly guide the graph to learn representation for each AU node. Contribution Pre-train a MAE model based on human face databases, which can generate a strong facial representation from each input facial display . Overcome the data imbalance problem in action units' detection.

METHODOLOGY Problem Definition Given T consecutive facial frames : Problem: Predict multiple AUs for all frame where represents the number ofpredicted AUs, t denotes the frame and can be either activated (1) or inactivated (0).

METHODOLOGY Main Architecture

METHODOLOGY Facial Representation Encoder P re-train a MAE model using a large amount of face images from CASIA- WebFace , AffectNet , IMDB-WIKI and CelebA : Encoder: R andomly masked face images are fed to generate latent features. Decoder: reconstructs the original image from these latent features. Input f acial image sequence : O utput set of facial representation: where represents a global facial representation of a face image; m is the number of patches and d denotes the dimension of each patch.

METHODOLOGY Spatial-Temporal Graph Learning AU-specific Feature Generator (AFG) : Consists of N branches. Each branch: a FC layer followed by a global average pooling (GAP) layer. where is the activation function; g and r denotes differentiable functions of the GCN layer, and denotes the connectivity between and . The new representation of the AU in the frame by GCN: Spacial GCN module : Employ the Facial Graph Generator (FGG) to construct adjacency, where first assume the edge is connectivity. Each vector feature from AFG is a node, then calculate feature similarity: Choose the top K nearest neighbours of each node as its neighbours with highest similarity scores.

METHODOLOGY Spatial-Temporal Graph Learning Temporal transformer module: Apply Transformer in time for each node in V where FFN is the feed forward network in transformer; Att denotes the self-attention function; and and are trainable weight matrices. where is the activation function, is a trainable vector . Cosine similarity calculating (SC) strategy is employed to predict the probability:

METHODOLOGY Spatial-Temporal Graph Learning Loss Function : A two-stage training strategy to train AU detection model For first-stage (pretrain MAE), employ Mean Square Error (MSE) loss to constrain the difference between the reconstructed patches and the original patches at the pixel-level. where are the prediction and ground truth; N and T are the numbers of AUs and frames of the input face sequence . For AU detection, An asymmetric loss to optimize the network A multi-label binary classification problem most AUs are inactivated for most face frames. where , denote ground truth pixels and reconstructed pixels .

EXPERIMENT AND RESULT EXPERIMENT SETTINGs Dataset: MAE: CASIA- WebFace , AffectNet , IMDB-WIKI and CelebA . AU detection: Aff-Wild2 Baselines: ME-Graph [1] and Netease [2] . [1] Luo, C., Song, S., Xie, W., Shen, L., & Gunes , H. (2022). Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition. arXiv preprint arXiv:2205.01782. [2] Zhang, W., Ma, B., Qiu, F., & Ding, Y. (2023). Multi-modal facial affective analysis based on masked autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5793-5802). Measurement : Average F1-score across all AUs: Individual AU class: where N denotes the number of AUs, and F1-score where is the calculated precision for the AU and is the recall rate.

EXPERIMENT AND RESULT RESULT – Overall Perfor mance

EXPERIMENT AND RESULT RESULT – Ablation Study

CONCLUSION P roposes an effective spatio -temporal AU relational GNN for AU occurrence recognition. MAE is introduced as the facial representation encoder for pretraining . A spatio -temporal graph learning module to model spatial relationships between different Aus and temporal dependencies among different frames. Summarization

[20240705_LabSeminar_Huy]Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

[20240705_LabSeminar_Huy]Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection​.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd

[20240705_LabSeminar_Huy]Spatial-Temporal Graph-Based AU Relationship Learning for Facial Action Unit Detection.pptx