Motivation
●Emotions are multimodal processes that play a crucial role in our everyday lives
●Recognizing emotions is becoming more essential in a wide range of application
domains
○Health-care
○Education
○Entertainment
○Advertisement
○Customer services
Motivation
●Many of application areas include human-human or human-machine conversations
●The focus of conversational emotion recognition is on facial expressions and text
●Human behavior can be controlled or faked
●Designing a general model using behavioral changes is difficult because of cultural or
language differences
●Physiological data are reliable but are very weak
Research Focus
How can we combinevarious behavioraland physiological
cues to recognize emotionsin human conversationsand
enhance empathyin human-machine interactions?
5
Research Questions
How can human body responses be employed to
identify emotions?
How can the data be obtained, captured, and
processed simultaneously from multiple sensors?
Can a combination of physiological cues be used
to recognize emotions in conversations accurately?
Can we increase the level of empathy between
humans and machines using neural and physiological
signals?
6
RQ
1
RQ
2
RQ
3
RQ
4
What are the human body's responses to emotional stimuli, and how can these
diverse responses be employed to identify emotions?
Reviewing and replicating existing research in human emotion recognition using
various modalities
8
RQ1
●Behavioral datasets
○CMU_MOSEI
○SEMANE
○IEMOCAP
○…
●Multimodal datasets with neural or physiological signals
○DEAP
○MAHNOB-HCI
○RECOLA
○SEED-IV
○AMIGOS
No research has specifically studied brain activity and physiological signals
for recognizing emotion in human-human and human-machine
conversations.
Multimodal Emotion Recognition
11
Multimodal Emotion Recognition
12
23
Participants
21 to 44 years old
!= 30, "= 6
13 females
10 males
Watching
video
Study 1: Multimodal Emotion
Recognition in watching video
Saffaryazdi, N., Wasim, S. T., Dileep, K., Nia, A. F., Nanayakkara, S., Broadbent, E., & Billinghurst, M. (2022). Using facial micro-expressions
in combination with EEG and physiological signals for emotion recognition. Frontiers in Psychology, 13, 864047.
●OpenBCI EEG cap
○EEG
●Shimmer3 GSR+ module
○EDA
○PPG
●Realsense camera
○Facial video
Sensors
1313
14
15
16
17
All
AllAll
AllAll
18
●Identified various modalities and recognition methods
●Fusing facial micro-expressions with EEG and physiological signals
●Identified the limitation and challenges
○Acquiring data from multiple sensors simultaneously
○Real-time data monitoring
○Automatic scenario running
○Personality differences
19
SummaryRQ1
How can the required data for emotion recognition be obtained, captured, and
processed simultaneously in conversation from multiple sensors?
Developing software for simultaneously acquiring, visualizing, and processing
multimodal data.
20
RQ2
●Octopus-sensing
○Simple unified interface for
●Simultaneous data acquisition
●Simultaneous data Recording
○Study design components
●Octopus-sensing-monitoring
○Real-time monitoring
●Octopus-sensing-visualizer
○Offline synchronous data visualizer
●Octopus-sensing-processing
○Real-time processing
21
Octopus Sensing
●Multiplatform
●Open-source (https://github.com/octopus-sensing)
●https://octopus-sensing.nastaran-saffar.me/
●Support various sensors
a.OpenBCI
b.Brainflow
c.Shimmer3
d.Camera
e.Audio
f.Network (Unity and matlab)
Saffaryazdi, N., Gharibnavaz, A., & Billinghurst, M. (2022). Octopus Sensing: A Python library for human behavior studies. Journal of
Open Source Software, 7(71), 4045.
Octopus Sensing
22
23
How to use
24
Octopus-sensing-Monitoring
●Monitor data from any machine in the same network.
●monitoring = MonitoringEndpoint(device_coordinator)
monitoring.start()
●pipenv run octopus-sensing-monitoring
●http://machine-IP:8080
25
Octopus-sensing-visualizer
●pipenv run octopus-sensing-visualizer
●http://localhost:8080
●Visualizing Raw or processed data using a config file
26
Octopus-sensing-processing
27
Can a combination of physiological cues be used to recognize emotions in
conversations accurately?
Conducting user studies and creating datasets of multimodal data in
various settings to compare human responses and explore recognition
methods in different settings.
28
RQ3
29
10 minutes conversation for
each emotion topic
Self-report
(Beginning, Middle, End)
Arousal, Valence, Emotion
Creating a conversational settings that people
could feel emotions spontaneously
Multimodal Emotion Recognition in Conversation
30
23
Participants
21 to 44 years old
!= 30, "= 6
13 females
10 males
Face-to-face
Conversation
Multimodal Emotion Recognition in conversation
Study 2: Multimodal Emotion
Recognition in Conversation
Saffaryazdi, N., Goonesekera, Y., Saffaryazdi, N., Hailemariam, N. D., Temesgen, E. G., Nanayakkara, S., ... & Billinghurst, M. (2022, March).
Emotion recognition in conversations using brain and physiological signals. In 27th International Conference on Intelligent User Interfaces(pp.
229-242).
Study 2 -Result
Self-report vs
target emotion
31
Study 2 -Result
Recognition
FScore
33
Comparing Various Conversational Settings
15
Participants
21 to 36 years old
!= 28.6, "= 5.7
7 females
8 males
Face-to-face
And Zoom
Conversation
Study 3:
Face-to-Face vs Remote conversations
35
Saffaryazdi, N., Kirkcaldy, N., Lee, G., Loveys, K., Broadbent, E., & Billinghurst, M. (2024). Exploring the impact of
computer-mediated emotional interactions on human facial and physiological responses. Telematics and Informatics
Reports, 14, 100131.
Study 3: Features
36
DataFeatures
EDAphasic statistics, tonic, and peaks statistics
PPGHeart rate variability time domain features which are features extracted by
hertpy and neurokit2 library
FaceOpenFace action units including these numbers:
1, 2,4, 5, 6, 7, 9, 10, 12, 14, 15, 17, 20, 23, 25, 26,28, and 45
Study 3: Result
3-way Repeated measure ART ANOVA
●Physiological differences between face-to-face and remote conversations
○Facial Action Units:
■Action units associated with negative emotions were higher in Face-to-face
■Action units associated with positive emotions Higher in Remote
○ED (tonic, phasic and peaks statistics)
■Reaction were substantial and immediate in f2f (Phasic mean higher in face-to-face)
○PPG (HRV time domain features)
■Higher HRV in remote conversation -> lower stress level, enhanced emotion
regulation, more engagement
37
Study 3: Result
●One-way and two-way Repeated-measure ART ANOVA
●Significant Empathy factors
○Interviewer to participant
38
Study 3: Result
●Emotion Recognition
○Feature Extraction
○Random Forest Classifier
○Leave-One-Subject-Out
●The high accuracy-> the high similarity
39
Findings
●People showed more positive facial expressions in remote conversations
●People felt stronger emotions and more immediate ones in f2f condition
●People felt lower level of stress in the remote condition
●Limitations
○Sample size
○Effect of interviewer
○Familiarity with remote conversations
●Cross-usage of multimodal dataset is quite successful
●Physiological emotion recognition are effective in conversational emotion recognition
●We can use these datasets to train models for real-time emotions recognition
RQ3
RQ4
Can we increase the level of empathy between humans and machines by using
neural and physiological signals to detect emotions in real-time during
conversations?
●Developing a real-time emotion recognition system using multimodal data.
●Prototyping an empathetic conversational agent by feeding the detected
emotions in real-time.
41
Human -computer conversation
23
Participants
21 to 44 years old
!= 30, "= 6
17 females
6 males
Human-Digital Human
Conversation
Study 4: Interaction with a Digital
human from Soul Machines
42
●Study 4
●Interaction between human and Digital Human
■Neutral
■Empathetic
●Real-time emotion recognition based on physiological cues
●Evaluating interaction with digital human
■Empathy factors
■Quality of Interaction
■Degree of Raport
■Degree of Liking
■Degree of Social Presence
Human-Machine Conversation
43
●Emotion Recognition in Real-time
○Arousal % 69.1 and Valence %57.3
●Induction method evaluation
Study 4 -Result
45
●appropriateness of reflected emotions in different agents
○Appropriate emotion (F(1, 166) = 10, p < 0.002)
●Appropriate time (F(1, 166) = 6, p < 0.01)
Real-time expression evaluation
47
Empathy evaluation
●Overall Empathy: F(1, 166) = 27, p < 0.001
●Cognitive empathy: F(1, 166) = 4.7, p < 0.03
●Affective empathy: F(1, 166) = 5.4, p < 0.02
49
Human-Agent Rapport
●Degree of Rapport (DoR)
○F(1, 42) = 8.38, p = 0.006
●Degree of Liking (DOL)
○F(1, 42) = 6.64, p < 0.01
●Degree of Social Presence (DSP)
○Not significantly different
●Quality of Interaction (QoI)
○Not significantly different
50
Physiological Responses
●EEG
○Not significant differences
○Shared neural processes
●EDA
○Higher skin conductance in interaction with empathetic agent
■Higher emotional arousal (excitement, engagement)
●PPG
○Higher HRV in interaction with empathetic agent
■Better mental health
■Lower anxiety
■Improved regulation of emotional responses
■Higher Empathy
■Increase attention / engagement
51
Conclusions
●Multimodal emotion recognition using physiological modalities is a promising approach
●Four Multimodal datasets in various settings
●The Octopus-Sensing software suite
●Improve empathy between humans and machines using neural and physiological data
52