Sarcastically Speaking: Unlocking Multi-modal Sentiment Analysis with NLP and Facial Expressions
DavidvonThenen
0 views
41 slides
Sep 25, 2025
Slide 1 of 41
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
About This Presentation
Sentiment analysis is easy—until sarcasm enters the chat. Traditional natural language processing models often stumble when trying to decode sarcastic nuances, missing crucial contextual cues and delivering misleading results. To tackle this, we will explore a multi-modal approach that integrates ...
Sentiment analysis is easy—until sarcasm enters the chat. Traditional natural language processing models often stumble when trying to decode sarcastic nuances, missing crucial contextual cues and delivering misleading results. To tackle this, we will explore a multi-modal approach that integrates facial expression analysis with textual inputs, dramatically improving accuracy in sentiment detection, particularly for sarcasm. By combining transformer-based NLP models and facial landmark detection, we create a richer, context-aware understanding of sentiment.
In this session, you'll explore how facial movements—like subtle eye rolls or eyebrow raises—can be quantified, combined with language embeddings, and processed to uncover hidden sarcastic sentiment. We'll walk through real-world datasets, demonstrate model training and evaluation, and share insights on deploying these models effectively in production. Attendees will leave with practical strategies and code examples, ready to integrate facial and textual analysis to tackle sarcasm head-on in their own NLP applications. As always, we will have live demos plenty.
Size: 3.73 MB
Language: none
Added: Sep 25, 2025
Slides: 41 pages
Slide Content
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
David vonThenen
@davidvonthenen
Sarcastically Speaking
Unlocking Multimodal Sarcasm Analysis With NLP and
Facial Expressions
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
2
●Are you Human or an AI?
●I want 5 Kubernetes
●Virtual Machines are Real
●Cloudy, cloudy, cloudy…
●There is storage for that!
David
vonThenen
@davidvonthenen
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
3
@davidvonthenen
Agenda
●Sarcasm Is A Difficult Problem
●Breaking The Problem Down
○Dataset Discussion
○Capturing Visual Cues
○Audio Characteristics
●Multimodal Classification
●Workshop Materials
●Q&A
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
4
@davidvonthenen
Sarcasm Is Tough…
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
5
Video Credit:
The Big Bang Theory S02E14 - The Financial Permeability, WarnerMedia
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
9
@davidvonthenen
Always Starts With Data
●Multimodal Sarcasm Detection Dataset
○github.com/soujanyaporia/MUStARD
○Paper: https://aclanthology.org/P19-1455.pdf
●Raw Video Clips
○The Big Bang Theory
○Friends
○The Golden Girls
○Sarcasmaholics Anonymous
●Capturing:
○Facial, Acoustic, Text
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
10
@davidvonthenen
Speaker Dependent Classification
Training Data Validation Data Test Data
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
11
@davidvonthenen
Speaker Independent Classification
Training Data Validation Data Test Data
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
12
@davidvonthenen
What Were Their Results?
●Speaker-Dependent Training
○Speaker Crossover
○Best: Text+Video ≈ 72%
weighted F1
○Unimodal:
■Text ≈ 65%
■Speech ≈ 65%
■Facial ≈ 67%
○~13% Error Reduction
From Multimodal
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
13
@davidvonthenen
Results Continued…
●Speaker-Independent Training
○New Speaker at Test Time
○Best: Text+Audio ≈ 63%
weighted F1
○Unimodal:
■Text ≈ 60%
■Speech ≈ 63%
■Facial ≈ 54%
○Video feat hurt (capturing
char/show bias) than
speaker-agnostic cues.
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
14
@davidvonthenen
How To Improve?
●A Lot of Data Grooming…
○Visually Isolate The Speaker
○Acoustic Speaker Isolation
○Removing "Dead" Audio
○Removing "Dead" Video
●Additional Prosodic Features
●Change Facial LSTM to Use
Bucketing Approach
○Handle Variable Length
●Facial Anchoring
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
15
Video Grooming: Before
Video Credit:
The Big Bang Theory (S01E08 "The Grasshopper Experiment")
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
16
Video Grooming: After
Video Credit:
The Big Bang Theory (S01E08 "The Grasshopper Experiment")
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
17
Audio Grooming: Before
Video Credit:
The Big Bang Theory (S03E11 "The Maternal Congruence")
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
18
Audio Grooming: After
Video Credit:
The Big Bang Theory (S03E11 "The Maternal Congruence")
●Landmark Acceleration
●Landmark Velocity
●Interpolate and Fill
○Missing Landmark in
Frames
●Handle Variable Length Clips
●Nose Tip Anchoring
Image Credit:
Entertainment Weekly
https://ew.com/movies/rudolph-the-red-nosed-reindeer-christmas-classic/
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
22
Original Video
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
@davidvonthenen
23
Video With Landmarks
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
24
@davidvonthenen
Data Processing Pipeline
●Video -> Frame Features to CSV
○Using dlib Landmarker
○Get X and Y Coordinates
●CSV File <-> Seq. Modeling
○Movement Data Across Frames
○Features Captured:
■Velocity/Accel, Buckets, Nose Tip
Anchoring, etc
○LSTM to Capture Temporal
Dependencies
●Tuning -> Final Model
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
25
@davidvonthenen
Model Architecture
●Long-Short Term Memory
○Think Time-Series Data
●LSTM Buckets
○Use Similar Sizes For Inference
●Nose Tip Anchoring
●Hyperparameter Tuning
CLICK HERE] for All Material Contained in this Session CLICK HERE
https://github.com/davidvonthenen/2025-ai-dev-europe
Workshop Includes:
●Build/Test of Facial Sarcasm Model
●Build/Test of Acoustic Sarcasm Model
●Running Multimodal Inference
Other Resources:
●SCaLE 22x - Parkinson's Gait and Audio Classifier
●RTC Conference 2024 Keynote - Parkinson's Gait
感谢您下载包图网平台上提供的PPT作品,为了您和包图网以及原创作者的利益,请勿复制、传播、销售,否则将承担法律责任!包图网将对作品进行维权,按照传播下载次数进行十倍的索取赔偿!
ibaotu.com
Thank You!
Senior AI/ML Engineer
@davidvonthenen
David vonThenen