Exploring Cutting-Edge Advancess in SER:

l228296 26 views 16 slides May 02, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Exploring cutting-edge advances in speech emotion recognition entails delving into revolutionary techniques that leverage deep learning and signal processing to discern and interpret emotional cues embedded in human speech. These techniques harness sophisticated neural network architectures, such as...


Slide Content

Exploring Cutting-Edge Advances in Speech Emotion Recognition: A Deep Dive into Revolutionary Techniques Abdur Rehman 23L-8023 ‹#›

Exploring Cutting-Edge Advances in Speech Emotion Recognition: A Deep Dive into Revolutionary Techniques Abdur Rehman 23L-8023 ‹#›

Title 01 Introduction 02 SER 03 Methods in SER 04 Motivation 05 Problem Statement 06 Research Objective 07 proposed Methodology 08 Literature Review 09 Thesis Presentation Outline ‹#›

Introduction Speech Emotion Recognition (SER) is a rapidly evolving field within affective computing. Advances in SER have been driven by the exploration of cutting-edge techniques and model architectures. The ability of machines to understand and respond to human emotions in speech has significant implications for various applications. SER plays a vital role in fields such as customer service, healthcare, and emergency call management. 01 02 03 04 ‹#›

SER SER involves the automatic identification and classification of emotions expressed in speech. 01 It is essential for improving user experience, human-computer interaction, and emotional analysis in various scenarios. 02 The accuracy and efficiency of SER systems have been continuously improving with the development of advanced techniques. 03 SER has applications in diverse fields, including sentiment analysis, virtual assistants, and mental health monitoring. 04 ‹#›

Techniques in SER 02 Support Vector Machine ( SVM ) algorithms combined with MFCC features have demonstrated good performance in emotion recognition tasks. 01 Attention-based models, such as CNN-LSTM architectures, have shown effectiveness in capturing subtle emotional cues in speech. 03 Novel approaches like Bag-of-Audio-Words (BoAW) embeddings and hybrid data augmentation techniques have significantly improved accuracy in recognizing basic emotions. 04 The integration of advanced deep learning architectures and attention mechanisms has led to breakthroughs in SER performance. ‹#›

Motivation The motivation behind SER research lies in enhancing human-computer interaction by enabling machines to understand and respond to human emotions. SER has the potential to revolutionize various applications, from customer care to emergency call management. The growing interest in SER is fueled by the increasing demand for emotionally intelligent systems. Emotion recognition in speech can lead to more personalized and effective human-machine interactions. ‹#›

Problem Statement Despite advancements in SER, there are challenges in accurately recognizing complex emotions and dealing with limited data scenarios. Noise in speech signals poses a challenge for emotion recognition systems. The ne ed for robust and accurate SER systems in real-world applications motivates the exploration of new methodologies. ‹#›

Research Objectives Objective #1 The primary objective of this research is to enhance Speech Emotion Recognition performance through innovative methodologies. Objective #2 To explore the effectiveness of novel techniques like BLSTM-DSA in improving emotion recognition accuracy. Objective #3 To address the challenges of recognizing complex emotions and optimizing SER performance in various scenarios. ‹#›

Proposed Methodology: Phase 1 The proposed methodology involves the integration of Bi-directional Long-Short Term Memory with Directional Self-Attention (BLSTM-DSA) for Speech Emotion Recognition. Phase 2 BLSTM-DSA aims to capture long-term dependencies and improve robustness in recognizing hidden emotions. Phase 3 By incorporating autocorrelation of speech frames, the algorithm automatically annotates weights to select frames with emotional information, enhancing temporal network performance. ‹#›

Literature Review ‹#›

Experimental Setup Dataset The evaluation of the proposed methodology was conducted on the IEMOCAP and EMO-DB databases. Method The experimental setup involved training and testing the BLSTM-DSA model on emotional datasets to assess its performance. Performance Metrics Performance metrics such as accuracy, precision, and recall were used to evaluate the effectiveness of the proposed methodology. ‹#›

Research Results Result #1 The evaluation results on the IEMOCAP and EMO-DB databases showed satisfactory performance of the BLSTM-DSA model. Result #2 BLSTM-DSA achieved the highest accuracies in recognizing happiness and anger emotions. Result #3 The experimental results demonstrate the effectiveness of the proposed methodology in enhancing Speech Emotion Recognition performance. ‹#›

Conclusions / Future work Conclusion In conclusion, the study highlights the remarkable progress made in Speech Emotion Recognition through innovative techniques and advanced model architectures. Future Work Future work includes exploring multi-feature fusion, keyword spotting, and optimization methods to further enhance SER performance. . Future Work Continued research and development in SER hold immense potential to revolutionize human-computer interaction and improve emotion recognition in speech applications. ‹#›

Singh, J., Saheer, L.B., Faust, O.: Speech emotion recognition using attention model. International Journal of Environmental Research and Public Health 20(6) (2023) DOI: 10.3390/ijerph20065140 Pratama, A., Sihwi, S.W.: Speech emotion recognition model using support vector machine through mfcc audio feature (2022) DOI: 10.1109/ICITEE56407.2022.9954111 Chamishka, S., Madhavi, I., Nawaratne, R., Alahakoon, D., De Silva, D., Chilamkurti, N., Nanayakkara, V.: A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling. Multimedia Tools and Applications 81(24), 35173–35194 (2022) DOI: 10.1007/s11042-022-13363-4 Pham, N.T., Dang, D.N.M., Nguyen, N.D., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D.: Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. Expert Systems with Applications 230, 120608 (2023) DOI: 10.1016/j.eswa.2023.120608 Li, D., Liu, J., Yang, Z., Sun, L., Wang, Z.: Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Systems with Applications 173, 114683 (2021) Xu, M., Zhang, F., Zhang, W.: Head fusion: Improving the accuracy and robustness of speech emotion recognition on the iemocap and ravdess dataset. IEEE Access 9, 74539–74549 (2021) DOI: 10.1109/ACCESS.2021.3067460 Aggarwal, A., Srivastava, A., Agarwal, A., Chahal, N., Singh, D., Alnuaim, A.A., Alhadlaq, A., Lee, H.-N.: Two-way feature extraction for speech emotion recognition using deep learning. Sensors 22(6) (2022) DOI: 10.3390/s22062378 Rajapakshe, T., Rana, R., Khalifa, S., Liu, J., Schuller, B.: A novel policy for pre-trained deep reinforcement learning for speech emotion recognition (2022) DOI: 10.1145/3511616.3513104 Selvan, A.K., Nimmi, K., Janet, B., Sivakumaran, N.: Emotion detection on phone calls during emergency using ensemble model with hyper parameter tuning. International Journal of Information Technology 15(2), 745–757 (2023) DOI: 10.1007/s41870-022-01091-9 Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M.: Transformer-based multilingual speech emotion recognition using data augmentation and feature fusion. Applied Sciences 12(18) (2022) DOI: 10.3390/app12189188 References ‹#›

QUESTIONS ‹#›