IVE 2024 Short Course Lecture10 - Multimodal Emotion Recognition in Conversational Settings

marknb00 118 views 44 slides Jul 21, 2024
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

IVE 2024 short course Lecture10 on Multimodal Emotion Recognition in Conversational Settings.

Lecture taught by Nastaran Saffaryazdi on July 17th 2024 at the University of South Australia.


Slide Content

Multimodal Emotion Recognition in Conversational
Settings
Nastaran Saffaryazdi
Empathic Computing Lab

Motivation
●Emotions are multimodal processes that play a crucial role in our everyday lives
●Recognizing emotions is becoming more essential in a wide range of application
domains
○Health-care
○Education
○Entertainment
○Advertisement
○Customer services

Motivation
●Many of application areas include human-human or human-machine conversations
●The focus of conversational emotion recognition is on facial expressions and text
●Human behavior can be controlled or faked
●Designing a general model using behavioral changes is difficult because of cultural or
language differences
●Physiological data are reliable but are very weak

Research Focus
How can we combinevarious behavioraland physiological
cues to recognize emotionsin human conversationsand
enhance empathyin human-machine interactions?
5

Research Questions
How can human body responses be employed to
identify emotions?
How can the data be obtained, captured, and
processed simultaneously from multiple sensors?
Can a combination of physiological cues be used
to recognize emotions in conversations accurately?
Can we increase the level of empathy between
humans and machines using neural and physiological
signals?
6
RQ
1
RQ
2
RQ
3
RQ
4

What are the human body's responses to emotional stimuli, and how can these
diverse responses be employed to identify emotions?
Reviewing and replicating existing research in human emotion recognition using
various modalities
8
RQ1

●Behavioral datasets
○CMU_MOSEI
○SEMANE
○IEMOCAP
○…
●Multimodal datasets with neural or physiological signals
○DEAP
○MAHNOB-HCI
○RECOLA
○SEED-IV
○AMIGOS
No research has specifically studied brain activity and physiological signals
for recognizing emotion in human-human and human-machine
conversations.
Multimodal Emotion Recognition
11

Multimodal Emotion Recognition
12
23
Participants
21 to 44 years old
!= 30, "= 6
13 females
10 males
Watching
video
Study 1: Multimodal Emotion
Recognition in watching video
Saffaryazdi, N., Wasim, S. T., Dileep, K., Nia, A. F., Nanayakkara, S., Broadbent, E., & Billinghurst, M. (2022). Using facial micro-expressions
in combination with EEG and physiological signals for emotion recognition. Frontiers in Psychology, 13, 864047.

●OpenBCI EEG cap
○EEG
●Shimmer3 GSR+ module
○EDA
○PPG
●Realsense camera
○Facial video
Sensors
1313

14

15

16

17
All
AllAll
AllAll

18

●Identified various modalities and recognition methods
●Fusing facial micro-expressions with EEG and physiological signals
●Identified the limitation and challenges
○Acquiring data from multiple sensors simultaneously
○Real-time data monitoring
○Automatic scenario running
○Personality differences
19
SummaryRQ1

How can the required data for emotion recognition be obtained, captured, and
processed simultaneously in conversation from multiple sensors?
Developing software for simultaneously acquiring, visualizing, and processing
multimodal data.
20
RQ2

●Octopus-sensing
○Simple unified interface for
●Simultaneous data acquisition
●Simultaneous data Recording
○Study design components
●Octopus-sensing-monitoring
○Real-time monitoring
●Octopus-sensing-visualizer
○Offline synchronous data visualizer
●Octopus-sensing-processing
○Real-time processing
21
Octopus Sensing

●Multiplatform
●Open-source (https://github.com/octopus-sensing)
●https://octopus-sensing.nastaran-saffar.me/
●Support various sensors
a.OpenBCI
b.Brainflow
c.Shimmer3
d.Camera
e.Audio
f.Network (Unity and matlab)
Saffaryazdi, N., Gharibnavaz, A., & Billinghurst, M. (2022). Octopus Sensing: A Python library for human behavior studies. Journal of
Open Source Software, 7(71), 4045.
Octopus Sensing
22

23

How to use
24

Octopus-sensing-Monitoring
●Monitor data from any machine in the same network.
●monitoring = MonitoringEndpoint(device_coordinator)
monitoring.start()
●pipenv run octopus-sensing-monitoring
●http://machine-IP:8080
25

Octopus-sensing-visualizer
●pipenv run octopus-sensing-visualizer
●http://localhost:8080
●Visualizing Raw or processed data using a config file
26

Octopus-sensing-processing
27

Can a combination of physiological cues be used to recognize emotions in
conversations accurately?
Conducting user studies and creating datasets of multimodal data in
various settings to compare human responses and explore recognition
methods in different settings.
28
RQ3

29
10 minutes conversation for
each emotion topic
Self-report
(Beginning, Middle, End)
Arousal, Valence, Emotion
Creating a conversational settings that people
could feel emotions spontaneously
Multimodal Emotion Recognition in Conversation

30
23
Participants
21 to 44 years old
!= 30, "= 6
13 females
10 males
Face-to-face
Conversation
Multimodal Emotion Recognition in conversation
Study 2: Multimodal Emotion
Recognition in Conversation
Saffaryazdi, N., Goonesekera, Y., Saffaryazdi, N., Hailemariam, N. D., Temesgen, E. G., Nanayakkara, S., ... & Billinghurst, M. (2022, March).
Emotion recognition in conversations using brain and physiological signals. In 27th International Conference on Intelligent User Interfaces(pp.
229-242).

Study 2 -Result
Self-report vs
target emotion
31

Study 2 -Result
Recognition
FScore
33

Comparing Various Conversational Settings
15
Participants
21 to 36 years old
!= 28.6, "= 5.7
7 females
8 males
Face-to-face
And Zoom
Conversation
Study 3:
Face-to-Face vs Remote conversations
35
Saffaryazdi, N., Kirkcaldy, N., Lee, G., Loveys, K., Broadbent, E., & Billinghurst, M. (2024). Exploring the impact of
computer-mediated emotional interactions on human facial and physiological responses. Telematics and Informatics
Reports, 14, 100131.

Study 3: Features
36
DataFeatures
EDAphasic statistics, tonic, and peaks statistics
PPGHeart rate variability time domain features which are features extracted by
hertpy and neurokit2 library
FaceOpenFace action units including these numbers:
1, 2,4, 5, 6, 7, 9, 10, 12, 14, 15, 17, 20, 23, 25, 26,28, and 45

Study 3: Result
3-way Repeated measure ART ANOVA
●Physiological differences between face-to-face and remote conversations
○Facial Action Units:
■Action units associated with negative emotions were higher in Face-to-face
■Action units associated with positive emotions Higher in Remote
○ED (tonic, phasic and peaks statistics)
■Reaction were substantial and immediate in f2f (Phasic mean higher in face-to-face)
○PPG (HRV time domain features)
■Higher HRV in remote conversation -> lower stress level, enhanced emotion
regulation, more engagement
37

Study 3: Result
●One-way and two-way Repeated-measure ART ANOVA
●Significant Empathy factors
○Interviewer to participant
38

Study 3: Result
●Emotion Recognition
○Feature Extraction
○Random Forest Classifier
○Leave-One-Subject-Out
●The high accuracy-> the high similarity
39

Findings
●People showed more positive facial expressions in remote conversations
●People felt stronger emotions and more immediate ones in f2f condition
●People felt lower level of stress in the remote condition
●Limitations
○Sample size
○Effect of interviewer
○Familiarity with remote conversations
●Cross-usage of multimodal dataset is quite successful
●Physiological emotion recognition are effective in conversational emotion recognition
●We can use these datasets to train models for real-time emotions recognition
RQ3

RQ4
Can we increase the level of empathy between humans and machines by using
neural and physiological signals to detect emotions in real-time during
conversations?
●Developing a real-time emotion recognition system using multimodal data.
●Prototyping an empathetic conversational agent by feeding the detected
emotions in real-time.
41

Human -computer conversation
23
Participants
21 to 44 years old
!= 30, "= 6
17 females
6 males
Human-Digital Human
Conversation
Study 4: Interaction with a Digital
human from Soul Machines
42

●Study 4
●Interaction between human and Digital Human
■Neutral
■Empathetic
●Real-time emotion recognition based on physiological cues
●Evaluating interaction with digital human
■Empathy factors
■Quality of Interaction
■Degree of Raport
■Degree of Liking
■Degree of Social Presence
Human-Machine Conversation
43

●Emotion Recognition in Real-time
○Arousal % 69.1 and Valence %57.3
●Induction method evaluation
Study 4 -Result
45

●appropriateness of reflected emotions in different agents
○Appropriate emotion (F(1, 166) = 10, p < 0.002)
●Appropriate time (F(1, 166) = 6, p < 0.01)
Real-time expression evaluation
47

Empathy evaluation
●Overall Empathy: F(1, 166) = 27, p < 0.001
●Cognitive empathy: F(1, 166) = 4.7, p < 0.03
●Affective empathy: F(1, 166) = 5.4, p < 0.02
49

Human-Agent Rapport
●Degree of Rapport (DoR)
○F(1, 42) = 8.38, p = 0.006
●Degree of Liking (DOL)
○F(1, 42) = 6.64, p < 0.01
●Degree of Social Presence (DSP)
○Not significantly different
●Quality of Interaction (QoI)
○Not significantly different
50

Physiological Responses
●EEG
○Not significant differences
○Shared neural processes
●EDA
○Higher skin conductance in interaction with empathetic agent
■Higher emotional arousal (excitement, engagement)
●PPG
○Higher HRV in interaction with empathetic agent
■Better mental health
■Lower anxiety
■Improved regulation of emotional responses
■Higher Empathy
■Increase attention / engagement
51

Conclusions
●Multimodal emotion recognition using physiological modalities is a promising approach
●Four Multimodal datasets in various settings
●The Octopus-Sensing software suite
●Improve empathy between humans and machines using neural and physiological data
52

Thank you!