Facial Emotion Recognition in Digital Identity Systems

martinamattioli5 25 views 30 slides Jul 01, 2024
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Facial Emotion Recognition in
Digital Identity Systems presentation fo Brussels Summer Academy for Global Law


Slide Content

Facial Emotion Recognition in Digital Identity Systems Martina Mattioli [email protected] Ca’ Foscari University of Venice 5 July 2024 Part 2: The unbearable (technical) unreliability of automated facial emotion recognition [email protected] Polytechnic of Turin Ph.D Student in Artificial Intelligence

01 04 02 05 03 Summary What is an emotion? How FER works? Conclusions and Questions Controversies Surrounding FER The dubious reliability of FER 2 /30

What is an emotion? In 1884 William James published «What is an emotion?» [1], taking part in a huge debate that was becoming relevant in the scientific community. 3 /30 “Almost everyone agrees that the study of emotion is one of the most confused (and still open) chapters in the history of psychology” [2] Plutchik (2001)

A difficult history 4 /30 How did the use of the term “emotion” evolve within the annals of philosophy? AFFECTIONS PASSIONS Intellect Reason Low appetites Emotions

A difficult history (2) 5 /30 STOICISM CHRISTIAN THEOLOGY DECARTES HUME DARWIN How did the use of the term “emotion” evolve within the annals of philosophy? Apatheia Thomas Aquinas and sin « Passions of the Soul» (1969) «A Treatise of Human Nature» (1740) «On the Origin of Species » (1859)

A difficult history (3) 6 /30 BEHAVIOURAL THEORIES The search for an answer to the question posed by William James has occupied numerous scholars from diverse psychological traditions , resulting in a great number of model proposals . COGNITIVE THEORIES PSYCHOANALYTIC THEORIES BASIC EMOTION THEORIES

How do we measure an emotion? The assessment and measurement of emotions are contingent upon the theories and models of reference. 7 /30 SELF DESCRIPTIONS EVALUATION OF THE PRODUCT OF THE BEHAVIOR EVALUATION OF THE BEHAVIOR RECORDING OF PHYSIOLOGICAL MODIFICATIONS FACE EMOTION LABELS ORDINAL VALUES

The assessment and measurement of emotions are contingent upon the theories and models of reference. SELF DESCRIPTIONS EVALUATION OF THE PRODUCT OF THE BEHAVIOR EVALUATION OF THE BEHAVIOR RECORDING OF PHYSIOLOGICAL MODIFICATIONS FACE EMOTION LABELS ORDINAL VALUES Russell’s [3] Circumplex Model

The emotion paradox Lisa Feldman Barrett [4] argues that the idea of measuring an emotion presents problems and challenges the natural-kind view of emotions, proposing the “emotion paradox”. This refers to the apparent contradiction between the subjective experience of emotions and the objective scientific understanding of emotions . 9 /30 Fig. 1: The Natural-Kind View of Emotions.

How FER works Face Emotion Recognition is a technology that identifies and interprets human emotions from facial expressions. 10 /30 FER is not a unitary field of study and comprises a multitude of approaches and technologies .

General pipeline 11 /30 Fig. 2: Samples from Aff-Wild2 dataset

General pipeline (2) Generally, it is possible to divide between traditional approaches and deep-learning-based approaches. Moreover, the process of identifying facial emotions involves several distinct stages. 12 /30 Fig. 3: Graphic representation of the general computational process of FER.

Examples from the real world Car Driving Safety; Autism Spectrum Disorders support; Support for the selection of human resources; Human-computer interaction; Medicine; Public surveillance. 13 /30

FER controversies 14 /30 Challenging emotions fingerprints Ethics Reliability

Challenging emotions fingerprints “When it comes to emotion, a face doesn’t speak for itself. In fact, the poses of the basic emotion method were not discovered by observing faces in the real world. Scientists stipulated those facial poses, inspired by Darwin’s book, and asked actors to portray them. And now these faces are simply assumed to be the universal expressions of emotion”. [4] Lisa Feldman Barrett (2001) 15 /30

Ethics and privacy concerns In recent years, there has been an increase in awareness regarding the ethical aspects surrounding the AI scenario. These discussions question whether these technologies have the potential to cause harm to individuals , and power imbalances. This can lead to: Racial biases; Privacy harms; Fairness and transparency concerns; Surveillance capitalism; General Human rights infringements; 16 /30

On the dubious reliability of FER In our study [5] we did not focus on conceptual challenges of emotions or on the scenarios in which the application of FER is unethical . 17 /30 Instead, we focused on the reliability of the ground truth employed by these systems.

Hypothesis 18 /30 H1: Can the inter-rater reliability of our FER ground truth be considered sufficient to support reliable research and analysis? H2: Does providing some sort of contextual information have any effect on the ground truth reliability? H3: Is intra-rater reliability high enough?

Hypothesis H1: Can the reliability of our FER ground truth be considered sufficient to support reliable research and analysis? H2: Does providing some sort of contextual information have any effect on the ground truth reliability? H3: Is intra-rater reliability high enough? What does it mean high enough ? Krippendorff [6] suggested that researchers should consider reliable data as those with reliability values higher than 0.8 ( adequacy threshold ); and should use data with values between 0.8 and 0.67 only to draw tentative conclusions ( acceptability threshold ). Data whose reliability measures are lower than 0.67 should be discarded.

First experiment 20 /30 Objective : Participants : Method Reliability measure Assess the inter- rater reliability of FER ground truth. Students from two master's degree courses at the University of Milano-Bicocca. Participants annotated 30 pictures with emotions based on Ekman’s basic emotion model and a 5-value ordinal scale though an online questionnaire. The participants were divided into no-context group (shown only the 30 pictures) and context group (shown the 28-second original video before the pictures). Measured reliability using Krippendorff's metric.

First experiment (2) 21 /30 Fig. 4: Six pictures depicting three subjects whose related emotions had to be recognized by the sample of raters involved in the study. The top images were associated with the highest agreement scores (easiest facial expressions to interpret and emotions to detect); the bottom ones with the lowest scores (hardest emotions to detect).

First experiment (3) 22 /30 Fig. 5: Example from the questionnaire

Second experiment 23 /30 Objective : Participants : Method Reliability measure Assess the intra- rater reliability of FER ground truth. Different students from the same classes as in the first experiment. Participants were tasked with filling in a slightly modified version of the questionnaire shown to the context group. From the 30 pictures in the questionnaire, 5 pictures were repeated randomly among the others. Measured reliability using Krippendorff's metric.

First Experiment Results Total responses: 198 complete responses , 5940 expressions , 41 580 emotion ratings. Participants : No- context group, 101 participants ; Context group, 97 participants . Reliability Values : All values are significantly lower than the adequacy threshold. Values significantly lower than the unacceptability threshold except for the ordinal representation of “anger’ “ (both groups) and “Fear” (context group). Comparison between groups: Context group reliability values are generally higher than no-context group. 24 /30

Second Experiment Results Total responses: 51 complete responses , 1530 expressions , 10 710 emotion ratings. Reliability Values : All values significantly lower than the adequacy threshold. The reliability values for the multi-label and distribution-based representations were significantly lower than the unacceptability threshold. 25 /30

Discussion These results show evidence of the intrinsic subjectivity of the emotion classification task from facial expressions, which consequently impacts automatic expression recognition systems. The FER community seems to suffer from a problem of ground truth reliability. In fact, all of the reliability scores are below the adequacy threshold, and most values are even below the unacceptability threshold. 26 /30

Conclusions Our results are consistent with previously reported low-reliability values for FER ground truths observed within the literature. This means that excessively low reliability on FER data necessarily entails low accuracy of FER applications. We confirmed that access to situational context improves emotion recognition by humans (in terms of reduced disagreement). Besides any ethical considerations the low reliability of emotion classification poses important challenges to the validation, and hence certification, of FER technologie s. Our findings thus provide empirical support for the inherent subjectivity and ambiguity of FER tasks , which have been already discussed in psychology, but rarely related to its potential impact on the development of FER technologies . 27 /30

Conclusions (2) 28 /30 Psychological foundations Ethical concerns Reliability issues

References [1] James , W. (1948). What is emotion? 1884. [2] Plutchik , R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344-350. [3] Russell , J. A. (1980). A circumplex model of affect. Journal of personality and social psychology, 39(6), 1161. [4] Barrett , L. F. (2006). Solving the emotion paradox: Categorization and the experience of emotion. Personality and social psychology review, 10(1), 20-46. [5] Cabitza , F., Campagner , A., & Mattioli, M. (2022). The unbearable (technical) unreliability of automated facial emotion recognition. Big data & society, 9(2), 20539517221129549. [6] Krippendorff , K. (2018). Content analysis: An introduction to its methodology. Sage publications. 29 /30

Suggested Readings 30 /30 [1] Barrett , L. F. (2006). Solving the emotion paradox: Categorization and the experience of emotion. Personality and social psychology review, 10(1), 20-46. [2] Barrett , L. F., Mesquita , B., & Gendron, M. (2011). Context in emotion perception. Current directions in psychological science, 20(5), 286-290. [3] Cabitza , F., Campagner , A., & Mattioli, M. (2022). The unbearable (technical) unreliability of automated facial emotion recognition. Big data & society, 9(2), 20539517221129549. [4] Le Mau , T., Hoemann , K., Lyons, S. H., Fugate, J. M., Brown, E. N., Gendron, M., & Barrett, L. F. (2021). Professional actors demonstrate variability, not stereotypical expressions, when portraying emotional states in photographs. Nature communications, 12(1), 5037. [5] Plutchik , R. (2001). The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344-350.
Tags