Sri-Pratap-College-Department-of-Information-Technology Presentation.pptx

TAHIRZAMAN81 7 views 36 slides Jul 07, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Facial expression recognition using cnn


Slide Content

Sri Pratap College, Department of Information Technology. Presentation on Facial Expression Recognition Using CNN. -Tahir Zaman (182103)

Introduction to Facial Expression Recognition A Facial Expression Recognition System (FER) is a computer application for automatically identifying or classifying a persons emotion based on his/her facial expression from a digital image. It is known after an research , that humans share seven different facial expressions as described in the picture. Facial expressions play a crucial role in identifying basic human emotions, such as anger, fear, happiness/joy, sadness, surprise, disgust, and neutral.

continue….. The emotional state of a human can be obtained from verbal and non-verbal information captured by various sensors, such as facial changes, tone of voice, and physiological signals. In 1967, Mehrabian[1] indicated that 7% of the message is conveyed by spoken words, 38% by voice intonation, while 55% of the message is conveyed by facial expressions. Facial expressions are produced by movements of facial features.The Facial Action Coding System proposed by Ekman and Friesen offers a method for identifying emotions. It involves a set of 46 Action Units (AU) that correspond to the movement of facial muscles. Each action unit is linked to a muscle, a set of muscles, or a complex movement, and certain muscle movements activate specific action units. As a result, a facial expression is formed by one or several action units, and each of the seven emotions is represented by a different set of valid action units.

Fundamental Concepts of Facial Expressions Facial expressions are a fundamental mode of nonverbal communication, serving as a window into an individual's inner emotional state. The study of facial expressions is rooted in the pioneering work of psychologists such as Paul Ekman , who identified seven basic emotional expressions: anger, disgust, fear, happiness, sadness, and surprise, neutral . These universal expressions are believed to be innate and cross-cultural, allowing for the universal recognition of emotions across different societies. Beyond the seven basic emotions, researchers have also explored the nuances of facial expressions, uncovering a rich tapestry of subtle variations that convey a range of more complex emotions and mental states. Factors such as eyebrow position, eye gaze, lip movement, and muscle tension all contribute to the intricate language of the human face, enabling us to communicate a wide spectrum of feelings, intentions, and social cues.

Continue 1 Basic emotions (BEs): There are seven basic human emotions: happiness , surprise, anger, sadness, fear, disgust and neutral. 2 Compound emotions are a combination of two basic emotions. DU et al introduced 22 emotions including basic emotions, 12 compound emotions are most typically expressed by humans and there additional emotions( appall, hate and awe). 3 Micro Expressions Micro expressions (MEs) are subtle and spontaneous movements that occur involuntarily and reveal a person's genuine underlying emotion within a short period of time. However, recognizing micro-expressions is difficult as they are often hidden intentionally or unintentionally and only visible in a small number of frames. These expressions require precise motion tracking and recognition algorithms. 4 spontaneous dataset A spontaneous dataset refers to expressions that are simulated by a participant. In this case, the participant is aware that they are being monitored, but their emotions are still displayed in a natural way. 5 In-the-wild-datasets : It refer to the collection of data in real-world settings, where participants are filmed in their natural environment. This type of data acquisition is less labored and more spontaneous, allowing researchers to capture behaviour in a more authentic setting. In-the-wild-datasets are often used in fields such as psychology, neuroscience, and computer vision to study human behaviour, social interactions, and environmental factors that influence decision-making. The use of in-the-wild-datasets provides a more realistic and comprehensive understanding of the behaviour being studied, which can lead to more accurate and applicable findings.

Importance of Facial Expression Recognition Facial expression recognition is a critical tool in understanding human emotions and behaviors. By analyzing the intricate movements of the face, this technology can provide valuable insights into a person's mental state, mood, and intentions. This information is essential for applications ranging from clinical psychology and human-computer interaction to security and marketing. Accurate facial expression recognition can enhance interpersonal communication, assist in early detection of mental health issues, and enable more natural and intuitive human-machine interactions. Additionally, it plays a vital role in areas such as customer service, user experience design, and emotion-based marketing, where understanding customer sentiment is crucial for improving products and services.

Applications of Facial Expression Recognition Clinical Psychology Facial expression recognition plays a crucial role in clinical psychology, helping clinicians assess patients' emotional states, detect early signs of mental health issues, and monitor the progress of therapeutic interventions. By analyzing subtle facial cues, this technology can provide valuable insights into an individual's psychological well-being. Human-Computer Interaction In the field of human-computer interaction, facial expression recognition enables more natural and intuitive interfaces. By interpreting users' emotional responses, computers can adapt their behavior, provide personalized recommendations, and create a more engaging and empathetic user experience. Customer Service and Marketing Facial expression recognition is invaluable in customer service and marketing, helping businesses understand customer sentiment, gauge their reactions to products or services, and tailor their offerings to better meet the needs and preferences of their target audience. This technology can also enhance customer experience and improve overall satisfaction.

Continue… Education. is a crucial part of a country's economy, and it's essential to disseminate practical knowledge and ensure appropriate learning. Monitoring and feedback from both learners and instructors are necessary for the learning process of every institution. Entertainment Industry: FER can be used to analyze the emotional responses of an audience. In filmmaking and animation, this information can be used to understand how viewers are engaging with the content being presented to them. Security Facial Expression Recognition (FER) technology can enhance identity recognition systems and improve their functionality for security purposes. Biometric systems, such as face recognition, can be used for identity authentication and applied to security, access control, forensics, and more. While security surveillance systems can monitor an environment and provide detailed information on events within a specified time frame, they may not be able to prevent imminent attacks from enemies.

Challenges in Facial Expression Recognition Occlusion: This is the challenge posed due to disturbance or hindrance that obscures the characteristic features of the expression image. The problem is limited to natural occurrences like moustaches and beards and some self-made ones like wearing glasses, cosmetics, headscarves or hijab. Ageing: Age categories contribute to variations in how people express emotion through the face. Example emotional states are observed in children’s faces, obviously noticed in adults and mildly displayed in elders. Cohn et al. In their investigation on the performance of optical flow and high gradient detection algorithm, the algorithm had less performance on infants compared to its performance to its performance on adults. The degradation in performance was assumed due to infant texture, more fatty tissue, facial conformation, and the absence of transient furrows.

Continue… Pose and Illumination variation: The location of a face at the time of data collection could also be challenging, In a 2D morphology; The head should be pushed in a frontal view, using a 2D image to reduce their competition cost but determination of appropriate facial features is extremely difficult. However, the reverse is the case for 3D images. A side view position could affect the performance of the system. Non-frontal view and rigid head motion are challenges peculiar to spontaneous data. Most facial expression databases are collected in a controlled environment; the expression images are static, acted by either professional or non-professional actors. FER Developed from the monitored environment is found to degrade the performance in the real world where spontaneous and sequence images are available. Databases: Facial expression database is a cogent aspect of the FER System like feature extraction and classifiers, Facial expression database is one of the factors that contributes immensely to the robustness of the FER system. The early facial expression databases were posed databases collected in a controlled environment. The choice of the database for facial expression recognition development depends upon the type of its applications. Apart from the posed databases, there are also spontaneous databases captured in the real scene- a naturally expressed facial database

Anatomy of Facial Expressions The human face is an intricate canvas of muscles, bones, and soft tissues that work in concert to produce the vast array of facial expressions we use to communicate our emotions and intentions. The primary muscles responsible for facial expressions are the frontalis , orbicularis oculi , corrugator supercilii , procerus , zygomaticus major , risorius , orbicularis oris , and depressor anguli oris . These muscles, in combination with the positioning of the eyebrows, eyelids, lips, and other facial features, allow us to convey a wide range of emotions, from joy and surprise to anger and disgust. Understanding the underlying anatomy and the specific muscle movements that generate each expression is crucial for developing accurate facial expression recognition algorithms.

Techniques for Facial Expression Recognition Facial Landmark Detection Facial Landmarks (FL's) are specific points on the face, such as the end of the nose or eyebrows and the mouth, that are used to identify unique facial features. The relative positions between two landmark points or the local texture of the landmarks are used as a feature vector for Facial Emotion Recognition (FER). There are three types of FL detection approaches: active shaped-based model (ASM) and appearance-based model (AAM), regression-based model with a combination of local and global models, and CNN-based models. FL models are trained using appearance and shape variations from a coarse initialization. The initial shape is then gradually moved to a better position until convergence. Facial Action Coding System (FACS) The Facial Action Coding System proposed by Ekman and Friesen offers a method for identifying emotions. It involves a set of 46 Action Units (AU) that correspond to the movement of facial muscles. Each action unit is linked to a muscle, a set of muscles, or a complex movement, and certain muscle movements activate specific action units. As a result, a facial expression is formed by one or several action units, and each of the seven emotions is represented by a different set of valid action units. Machine Learning Techniques Cutting-edge machine learning algorithms, including Support Vector Machines (SVMs), Random Forests, and Deep Neural Networks, have revolutionized facial expression recognition. These models are trained on large datasets of labeled facial expressions, enabling them to learn the complex patterns and subtle nuances that distinguish different emotional states.

FER Databases Facial expression databases play a crucial role in the development of Facial Expression Recognition (FER) systems. Traditionally, humans have been studied using either 2D static images or 2D video sequences, but there are also databases that contain 3D images. FER systems that are built on 2D approaches have limitations in handling different poses, since most 2D databases only contain frontal faces. On the other hand, a 3D approach has the potential to handle pose variation problems more effectively. Most FER databases are labeled with the six basic emotions (anger, disgust, fear, happiness, sadness, and surprise), as well as a neutral expression. Early facial expression databases were posed databases collected in controlled environments, generally inside a laboratory with controlled lighting conditions. Here, subjects were asked to pose a certain emotion towards a reference. Apart from posed datasets, there are also spontaneous databases captured in real-world settings, which are naturally expressed facial expression databases. Recently, there is a growing need to take FER beyond the laboratory to real-world applications, which requires facial expression databases in an unconstrained and uncontrolled environment, also known as In-The-Wild databases.

continue… Facial Expression Recognition 2013 Database [FER-2013] Japanese Female Facial Expression Database (JAFEE) The Extended Cohn-Kanade Database (CK+) Emotion Recognition In The Wild Database (Emotiw) The Binghamton University 3D Facial Expression (BU-3DFE) The MMI database Enterface'05 Audiovisual Emotion Database Compound Emotion (CE) dataset.

Database used. The dataset, used for training the model is from a Kaggle Facial Expression Recognition Challenge a few years back (FER2013). The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral). The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

Emotion labels in the dataset: 0: 4593 images- Angry 1: 547 images- Disgust 2: 5121 images- Fear 3: 8989 images- Happy 4: 6077 images- Sad 5: 4002 images- Surprise 6: 6198 images- Neutral

Structure of Facial Expression Recognition System. FER’s general architecture comprises three major phases; pre-processing, feature extraction, and classification or recognition. These phases carry out their respective tasks sequentially on a particular FER database to establish ground truth for the system to achieve its goal. Although FER architecture contains two significant phases, the feature extraction phase and the classification or recognition phase, in most cases, the preprocessing stage is a crucial phase that should not be left out. Automatic FER architecture most time begins with the preprocessing phase. The next phase is feature extraction, where the discriminating features are extracted using feature extraction techniques. The last phase is the classification phase, where each expression image belongs to one of the six basic classes of emotion.

pre-processing phase: Facial feature preprocessing is a crucial step in the process of Facial Expression Recognition (FER), as it helps to preserve relevant features by limiting the infiltration of redundant information during data extraction. Whether you are using conventional machine learning methods or deep learning models, pre-processing has a significant influence on the performance of your system. In fact, pre-processing is one of the most important stages in any Machine Learning based system, as analyzing raw data without any kind of screening can lead to unwanted results. Therefore, it is vital to assure the quality of data before extracting its relevant features. The pre-processing phase includes various techniques that can help to improve the quality of the data. Some of the most popular and effective pre-processing techniques include data normalization, noise reduction, image enhancement, and feature extraction. Data normalization involves scaling the data to ensure that it has a consistent range and distribution. Noise reduction techniques help to remove any unwanted noise from the data, such as background noise or sensor noise. Image enhancement techniques are used to improve the visual quality of the images, which can help to improve the accuracy of the system. Finally, feature extraction techniques are used to extract the most relevant features from the data, which can then be used to train the machine learning model.

pre-processing phase. Face detection It is a crucial step in a Facial Expression Recognition (FER) system. It involves selecting the Region of Interest (ROI) in an input image that will be used in the subsequent stages of the system. Geometric Transformations The detected faces may not be in proper conditions to be analyzed due to issues such as rotation, scale, and noise. To ensure that the face to be analyzed is geometrically similar to the faces used when training the classifier, various geometric transformations are necessary. image processing In order to enhance the accuracy of FER systems, several image processing techniques can be applied to accentuate relevant features that are going to be used in the classifier. Some of the most commonly used image processing techniques in FER systems include edge detection, histogram equalization, smoothing, and noise reduction. By applying these techniques, the FER system can more accurately and efficiently detect and classify facial expressions.

Feature extraction After the pre-processing phase, the next step in developing a facial expression recognition (FER) system is feature extraction. This is a crucial step because the accuracy of the system heavily relies on the quality of the extracted features. In a conventional FER system, the relevant features are facial features, such as the position of the eyes, nose, and mouth, as well as the shape and orientation of the face. To extract these features, several techniques have been developed in the field of Computer Vision. These techniques are designed to analyze the facial features and extract the most relevant information that can be used to identify different expressions. Some of the most popular feature extraction techniques are given below.

Feature extraction continue.. Optical Flow (OF): is a method used to analyze the magnitude and direction of motion in a sequence of frames, typically a video. This technique calculates the motion between two frames at a pixel level, outputting a vector that shows the movement of pixels in an image from the first frame to the second. Histogram of Oriented Gradients (HOG): Histogram of Oriented Gradients (HOG) is a method that explains the local appearance and shape of an object in an image. It does so by analyzing the distribution of intensity gradients or edge directions in the image. The image is divided into cells with a specific number of pixels, and for each cell, a histogram of gradient directions is created. The Scale-Invariant Feature Transform (SIFT) The Scale-Invariant Feature Transform (SIFT) is a Computer Vision algorithm that detects and describes local features in images. SIFT features are immune to uniform scaling, orientation, and illumination changes. However, they are vulnerable to blur and affine changes. Gabor Filter The Gabor Filter is a method used to represent texture information. It is capable of providing characteristic selection about orientation and scale, and it remains robust even under harsh illumination conditions. By capturing the spatial information of frequency, position, and orientation from an image, it can effectively extract subtle local transformations.

Classification/Regression: In machine learning, there are two main methods used for prediction: classification and regression. A classification model predicts a label or category for an input image or features. On the other hand, a regression model determines the relationship between a dependent variable and independent variables. The reviewed works show that classification is the most popular method used. The following are some of the most commonly used classification and regression algorithms:

continue.. Convolutional Neural Network: CNN is a type of neural network mainly used in Computer Vision (Deep Learning) due to its ability to solve multiple image classification problems. CNNs can even outperform humans in some of these problems since they can identify underlying patterns that are too complex for the human eye. An input image is processed through several hidden layers of the CNN that decompose it into features. Support Vector Machine (SVM): Support Vector Machine (SVM) is a popular Machine Learning algorithm that is mainly used for classification and regression tasks. The core idea of SVM is to map input features into a higher dimensional space so that the features belonging to each class are separated by a clear gap that is as wide as possible. During the training phase, SVM creates a model that represents these features in space, which is then used for predictions.,

continue.. Decision Tree (DT): A Decision Tree (DT) is a flowchart-like model used as a classifier. It splits a database into smaller subsets of data until no further splits can be made, with the resulting leaves representing the classes used for classification. This classifier is particularly useful for learning nonlinear relationships in data, handling high-dimensional data, and being easy to implement. However, it has the main disadvantage of overfitting, as it may continue branching until it memorizes the data during the training phase. Euclidean Distance (ED): It refers to the distance between two points in Euclidean space. In some reviewed works, this metric is used for classification purposes. For instance, it is used for calculating the distance between facial features of a certain facial expression and the mean vector of facial features for each emotion. The emotion that presents the closest distance is then assigned to the input face. The ED between two points (x,y) can be defined through the following equation:

continue.. Naive Bayes: Naive Bayes classifiers are a group of probabilistic Machine Learning classifiers that rely on Bayes theorem and assume strong independence between features. The Bayes theorem is expressed in the following equation: P(A|B) = P(B|A)(A) P(B) (1) This equation allows us to calculate the likelihood of event A happening given that B is true.

Introduction to Convolution Neural Network A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data. In a regular Neural Network there are three types of layers: Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal to the total number of features in our data (number of pixels in the case of an image). Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden layers depending on our model and data size. Each hidden layer can have different numbers of neurons which are generally greater than the number of features. The output from each layer is computed by matrix multiplication of the output of the previous layer with learnable weights of that layer and then by the addition of learnable biases followed by activation function which makes the network nonlinear. Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or softmax which converts the output of each class into the probability score of each class.

continue… CNN architecture Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling layer, and fully connected layers.

SOFTWARE REQUIREMENT: Anaconda spyder jupiter notebook operating system(widows 10, mac etc) Programming Language(python) Python Libraries and Frameworks Numpy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Keras, OpenCV, Dlib

Hardware Requirements. Camera(web cam) Graphics Processing Unit (GPU) The central processing unit (CPU) Memory (RAM) Monitor Peripheral Devices(Keyboard and Mouse)

Working of the model

Recent Advancements and Trends The field of facial expression recognition has seen remarkable advancements in recent years, driven by the rapid progress of artificial intelligence and computer vision technologies. Deep learning models , such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) , have revolutionized the accuracy and robustness of facial expression recognition systems, enabling them to detect and classify a wide range of subtle and complex emotional states with unprecedented precision. Additionally, the integration of 3D facial analysis and dynamic expression modeling has further enhanced the capabilities of facial expression recognition, allowing for more comprehensive and nuanced understanding of facial movements and their emotional significance. Emerging trends also include the incorporation of multimodal data , such as combining facial expressions with speech recognition and body language analysis , to provide a more holistic assessment of emotional states.

Conclusion and Future Directions Facial expression recognition has emerged as a powerful tool, offering invaluable insights into human emotions, behaviors, and cognitive states. As this technology continues to evolve, it holds immense promise for transforming a wide range of applications, from clinical psychology and human-computer interaction to customer service and marketing. Going forward, the field of facial expression recognition is poised to witness several exciting advancements. Researchers and developers will likely focus on enhancing the accuracy and robustness of recognition models, particularly in the face of challenging real-world scenarios, such as occlusions, diverse facial features, and cultural variations. Leveraging multimodal data integration, combining facial expressions with other modalities like speech, body language, and physiological signals, to provide a more holistic understanding of emotional states. Exploring personalized and adaptive facial expression recognition systems, tailored to individual preferences and cultural norms, to improve the generalizability and real-world applicability of this technology. Advancing real-time and embedded facial expression recognition solutions, enabling seamless integration into devices and applications for immediate, context-aware emotional analysis.

THANK YOU
Tags