Research paper presentation for a project .pptx

MaryamAziz47 7 views 40 slides Jul 13, 2024
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

Its a presetation


Slide Content

By Renxi Wang Shi Feng

Findings of the Association for Computational Linguistics: EACL 2023, pages 2120–2127 May 2-6, 2023 ©2023 Association for Computational Linguistics

Global-Local Modeling with Prompt-Based Knowledge Enhancement for Emotion Inference in Conversation

RNn RNN stands for Recurrent Neural Network, which is a type of neural network architecture commonly used in natural language processing and sequential data tasks.

PLM PLM stands for "Pre-trained Language Model." Pre-trained language models are a type of artificial neural network designed for natural language processing tasks.

Emotion Recognition in Conversation (ERC) vs. Emotion Inference in Conversation (EIC) ERC aims to identify emotion labels in an utterance with the entire dialogue history. EIC predicts emotions without the current utterance but with the knowledge of dialogue history and the current addressee.

Example

Emotion Recognition in Conversation (ERC) ERC Defined: The task of identifying emotion labels in an utterance. ERC Objectives: Understand and classify emotions in a conversation. Previous Approaches: Sequence-based neural networks have been used for ERC but have limitations, especially in feature extraction.

Emotion Recognition Challenges Challenges in ERC: Feature extraction may lead to information loss. Distinguishing Emotions: As model layers deepen, distinguishing between similar emotions becomes difficult.

Emotion Inference in Conversation (EIC) EIC Defined: Predicting emotions with dialogue history and the current addressee's information, excluding the last utterance. Unique Aspects of EIC: Focus on understanding emotions in conversation without the current utterance.

Proposed Approach: Global-local modeling combines sequence-based and pre-trained models. Benefits: Improved representation of dialogue history and close-to-addressee utterances, enhancing EIC performance. Global-Local Modeling Approach

Emotion Detection Enhancement External Knowledge: Researchers have used external knowledge to enhance emotion detection in EIC. Limitations: Previous attempts had limitations, such as knowledge limited to certain event types.

Knowledge Generation for EIC Knowledge Generation Method: We propose a knowledge generation method based on prompt learning. Two Kinds of Knowledge: Pseudo utterances generated by GPT and GPT's responses on how the addressee feels.

Benefits of Knowledge Generation Advantages: Knowledge is precise and diverse, based on the entire dialogue history, making it valuable for EIC. Quality Improvement: Quality of knowledge is enhanced, unlike knowledge generated from single utterances.

Problem Definition Given a dialogue D = [ (U1, p1) , (U2, p2) , · · ·, (Um, pm) , pm+1], where Ui is the utterance in i-th turn and pi is the participant in i-th turn. For i = m + 1, pi is the addressee, otherwise pi is the speaker. The task is to predict the addressee’s emotion e using D.

Global model DialogueInfer (Li et al., 2021a) as our global model for Emotion Inference in Conversation (EIC).

App roach approach involves the following steps: Fine-tuning of a RoBERTa -Large model (Liu et al., 2019) to predict the emotion label of utterances, treating it as the Emotion Recognition in Conversation (ERC) task. Subsequently, use the fine-tuned model to extract features from the utterances, resulting in a 1024-dimensional vector ui for each utterance Ui . These representations of utterances are then integrated into the DialogueInfer model to derive the overall representation of the dialogue.

Continue… DialogueInfer is specifically designed for the EIC task, incorporating addressee-aware modules aimed at capturing the persistence and contagiousness of utterances.

Formally, the output of the global model can be defined as follows: ht , ct = 1{ pt = pm+1} LSTMa ( ut , (ht−1, ct−1)) + 1{ pt ̸= pm+1} LSTMo ( ut , (ht−1, ct−1)) hg = hm+1 (1)

Local model local model is based on RoBERTa (Liu et al., 2019), which shares its architecture with BERT (Devlin et al., 2019) and is trained with a masked language modeling objective function. To create the input for the local model, we concatenate the last k utterances. This input is made addressee-aware by prepending a speaker prefix to indicate whether the utterance comes from the addressee.

Continue… The final text input is formulated as follows: Ut = prfix (pm−k+1)Um−k+1</s> prfix (pm−k+2)Um−k+2</s> · · · prfix (pm)Um (2) Here, prfix (pi) indicates the speaker prefix: "I:" if pi = pm+1 "Other:" if pi ̸= pm+1 (3) The special token </s> is used to denote the separation of utterances. To incorporate the global information, we add the global representation hg to the first token's embedding of the text input. The process can be defined as: hˆg = WT hg + b (4)

Continue…. H = [ Emb ( Ut [0]) + hˆg ; Emb ( Ut [1:]) (5) he = RoBERTa -Model(H) (6) Here, W ∈ Rd1×d2 is the matrix used for dimension projection, d2 represents the hidden dimension in the RoBERTa model, and Emb denotes the embedding layer of RoBERTa .

Framework to infer emotions

They employ GPT-3 (Brown et al., 2020), a powerful model known for generating informative and accurate texts, particularly when provided with suitable examples. The model is further fine-tuned to align with user requirements, resulting in outputs that are more truthful and less toxic (Ouyang et al., 2022). They utilize the fine-tuned model, known as InstructGPT , for generating two types of knowledge.

1. Pseudo Utterances Th ey take the dialogue history as input and leverage InstructGPT to generate potential utterances that might be spoken by the addressee. After obtaining these knowledge texts, they first prepend the addressee prefix to them and then append them to the text input in the local model.

Feelings and Corresponding Reasons InstructGPT is versatile and capable of performing various tasks. As a result, they directly inquire about the addressee's emotions and their corresponding reasons. The output from the model is considered as knowledge and is utilized in the same way as pseudo utterances.

Classifier In model, they utilize the representation of the first token, denoted as he , as the final output. To predict emotions, they employ a softmax layer after a linear projection layer: pe = softmax (WT he + b) In this equation, W is the projection matrix with dimensions d2×c , where c represents the number of emotions. The output pe is a probability distribution over different emotions.

Experiments model was trained on three distinct datasets: DailyDialog (Li et al., 2017), MELD ( Poria et al., 2019a), and EmoryNLP ( Zahiri and Choi, 2018).

Training Process Initial finetuning : We used a RoBERTa -Large model trained on each dataset's training set. Batch Size: Set to 16. Model Selection: Saved the model with the best performance on the development set.

For emotion inference:Learning Rate: Set to 1e-5. Optimizer: they used AdamW to update parameters. Training Scheme: In the first two epochs, only the global model was updated, and the local model was frozen. Afterward, they finetuned the entire model. This scheme improved training stability.

Loss Function Cross entropy was used as the loss function.

Main results

Ablation study To assess the effectiveness of different modules in our model, we conducted ablation studies on the three datasets.

Continue …. Removing Addressee Information: To understand the impact of addressee information, we replaced the global model with a single LSTM and the addressee prefix with the speaker's name.

Findings of ablation studies Key Finding: Results indicate that the local model plays a crucial role in our model's performance and is generally more important than other modules.

Continue… Dataset Variations: Interestingly, the significance of addressee information varies depending on the dataset. In DailyDialog , which is dyadic, the addressee information is less critical, as the second to last utterance in our input texts must be from the addressee.

Ablation studies result

Limitation Since in this framework the global model needs to first compute the global representation then the lo- cal model outputs the emotion distribution, it takes longer time to train and inference than other models. They utilize a pre-trained model in framework, which requires large GPU memory
Tags