0aca523d-6839-4a24-93a6-90899a971882.pptx

Unit-I Machine Learning Basics

Course Outcomes DEPARTMENT OF INFORMATION TECHNOLOGY (2024-2025) 6th Semester Deep Learning (BTCS 704-18 ): CO.1 Comprehend the advancements in learning techniques CO.2 Compare and explain various deep learning architectures and algorithms. CO.3 Demonstrate the applications of Convolution Networks CO.4 Apply Recurrent Network for Sequence Modelling CO.5 Deploy the Deep Generative Models Goodfellow L., Bengio Y. and Courville A., Deep Learning, MIT Press (2016).

Learning, Under-fitting, Overfitting , Estimators, Bias, Variance, Maximum Likelihood Estimation, Bayesian Statistics, Supervised Learning, Unsupervised Learning and Stochastic Gradient Decent.

Machine Learning A subset of artificial intelligence known as machine learning focuses primarily on the creation of algorithms that enable a computer to independently learn from data and previous experiences. Without being explicitly programmed, machine learning enables a machine to automatically learn from data, improve performance from experiences, and predict things.

Need for Machine Learning Rapid increment in the production of data Solving complex problems, which are difficult for a human Decision making in various sector including finance Finding hidden patterns and extracting useful information from data.

Learning in ML Learning in ML refers to the process of enabling a model (algorithm) to improve its performance on a given task by exposing it to data. The model identifies patterns, relationships, or rules from the data and uses this knowledge to make predictions or decisions.

Types of Learning in ML Supervised Learning Unsupervised Learning Reinforcement Learning

Supervised Learning In supervised learning, sample labeled data are provided to the machine learning system for training, and the system then predicts the output based on the training data. The system uses labeled data to build a model that understands the datasets and learns about each one. After the training and processing are done, we test the model with sample data to see if it can accurately predict the output. The mapping of the input data to the output data is the objective of supervised learning. The managed learning depends on oversight, and it is equivalent to when an understudy learns things in the management of the educator. Spam filtering is an example of supervised learning. Supervised learning can be grouped further in two categories of algorithms: Classification Regression

Unsupervised Learning Unsupervised learning is a learning method in which a machine learns without any supervision. The training is provided to the machine with the set of data that has not been labeled, classified, or categorized, and the algorithm needs to act on that data without any supervision. The goal of unsupervised learning is to restructure the input data into new features or a group of objects with similar patterns. In unsupervised learning, we don't have a predetermined result. The machine tries to find useful insights from the huge amount of data. It can be further classifieds into two categories of algorithms: Clustering Association

Semisupervised Learning Combines a small amount of labeled data with a large amount of unlabeled data. Improve learning efficiency by leveraging both labeled and unlabeled data. Speech analysis (labeled: transcriptions, unlabeled: audio files). When labeling data is costly or time-consuming.

Reinforcement Learning Reinforcement learning is a feedback-based learning method, in which a learning agent gets a reward for each right action and gets a penalty for each wrong action. The agent learns automatically with these feedbacks and improves its performance. In reinforcement learning, the agent interacts with the environment and explores it. The goal of an agent is to get the most reward points, and hence, it improves its performance. The robotic dog, which automatically learns the movement of his arms, is an example of Reinforcement learning.

Key Steps in ML Learning Data Collection : Gather relevant data from various sources. Data Preprocessing : Clean and prepare the data for training. Model Selection : Choose an algorithm based on the problem type. Training : Teach the model using training data. Evaluation : Assess the model's performance using test data. Prediction/Deployment : Use the model to make predictions on unseen data.

Learning Metrics To measure the performance of learning: Accuracy : Correct predictions / Total predictions. Precision, Recall, F1-Score : Metrics for classification tasks. Mean Squared Error (MSE) : Used in regression tasks.

Deep learning Deep learning is a branch of machine learning that uses neural networks to teach computers to do what comes naturally to humans: learn from example. In deep learning, a model learns to perform classification or regression tasks directly from data such as images, text, or sound. Deep learning models can achieve state-of-the-art accuracy, often exceeding human-level performance.

How Does Deep Learning Work? Deep learning models are based on neural network architectures. Inspired by the human brain, a neural network consists of interconnected nodes or neurons in a layered structure that relate the inputs to the desired outputs. The neurons between the input and output layers of a neural network are referred to as hidden layers. The term “deep” usually refers to the number of hidden layers in the neural network. Deep learning models can have hundreds or even thousands of hidden layers.

Neural Network Architecture Viewing a typical neural network architecture

Contd.. Deep learning models are trained by using large sets of labeled data and can often learn features directly from the data without the need for manual feature extraction. While the first artificial neural network was theorized in 1958, deep learning requires substantial computing power that was not available until the 2000s. Now, researchers have access to computing resources that make it possible to build and train networks with hundreds of connections and neurons. High-performance GPUs have a parallel architecture that is efficient for deep learning. When combined with clusters or cloud computing, this enables development teams to reduce training time for a deep learning network from weeks to hours or less.

Challenges in Learning Overfitting: The model performs well on training data but poorly on new data. Underfitting: The model fails to capture the underlying pattern of the data. Data quality and quantity: Poor or insufficient data hampers learning.

Overfitting Overfitting is a common problem in machine learning where a model learns not only the underlying patterns in the training data but also the noise and random fluctuations. This makes the model perform well on the training data but poorly on unseen data (e.g., the test set or real-world data). Characteristics of Overfitting High Training Accuracy, Low Test Accuracy : The model performs exceptionally well on the training set but struggles to generalize to new data. Complex Model : The model may have too many parameters or is too flexible, capturing unnecessary details. Symptoms : Extremely close fit to training data points. Poor performance on validation or test data.

Over-fitting When a model performs very well for training data but has poor performance with test data (new data), it is known as overfitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. Overfitting can happen due to low bias and high variance.

Reasons for Overfitting Data used for training is not cleaned and contains noise (garbage values) in it The model has a high variance The size of the training dataset used is not enough The model is too complex

Ways to Tackle Overfitting Using K-fold cross-validation Using Regularization techniques such as Lasso and Ridge Training model with sufficient data Adopting ensembling techniques

What is Underfitting ? When a model has not learned the patterns in the training data well and is unable to generalize well on the new data, it is known as underfitting. An underfit model has poor performance on the training data and will result in unreliable predictions. Underfitting occurs due to high bias and low variance.

Reasons for Underfitting Data used for training is not cleaned and contains noise (garbage values) in it The model has a high bias The size of the training dataset used is not enough The model is too simple

Ways to Tackle Underfitting Increase the number of features in the dataset Increase model complexity Reduce noise in the data Increase the duration of training the data

What Is a Good Fit In Machine Learning? To find the good fit model, look at the performance of a machine learning model over time with the training data. As the algorithm learns over time, the error for the model on the training data reduces, as well as the error on the test dataset. If you train the model for too long, the model may learn the unnecessary details and the noise in the training set and hence lead to overfitting. In order to achieve a good fit, you need to stop training at a point where the error starts to increase.

Estimator An estimator is a function that uses data to estimate an unknown quantity. This quantity could be: Model parameters: For example, the weights and biases in a neural network. Statistical properties: Such as the mean or variance of a population. Predictions: The output of a model for new, unseen data.

Contd.. Bias: The systematic difference between the expected value of the estimator and the true value of the quantity being estimated. Variance: The variability of the estimator across different samples of data. Trade-off: There's often a trade-off between bias and variance. High bias estimators tend to underfit the data, while high variance estimators tend to overfit. Consistency: An estimator is consistent if it converges to the true value as the sample size increases.

Common Estimators Maximum Likelihood Estimation (MLE): Finds the parameters that maximize the likelihood of observing the given data. Least Squares Estimation: Minimizes the sum of squared differences between the predicted values and the actual values. Bayesian Estimation: Incorporates prior knowledge about the parameters to obtain a posterior distribution.

Example: Linear Regression In linear regression, the goal is to find the best-fitting line that represents the relationship between two variables. The estimator in this case is the function that estimates the slope and intercept of the line. linear regression line with data points

Key Considerations Choosing the right estimator: The choice of estimator depends on the specific problem, the available data, and the desired properties of the estimator (e.g., bias, variance, computational cost). Evaluating estimator performance: Various metrics can be used to evaluate the performance of an estimator, such as mean squared error (MSE), accuracy, and F1-score.

Bias Bias is simply defined as the inability of the model because of that there is some difference or error occurring between the model’s predicted value and the actual value. These differences between actual or expected values and the predicted values are known as error or bias error or error due to bias. Bias is a systematic error that occurs due to wrong assumptions in the machine learning process. Let Y be the true value of a parameter, and let Y^ be an estimator of Y based on a sample of data. Then, the bias of the estimator Y^ is given by: Bias(Y^)=E(Y^)–Y where E(Y^) is the expected value of the estimator Y^. It is the measurement of the model that how well it fits the data.

Cont.. It is the measurement of the model that how well it fits the data. Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this case, the model will closely match the training dataset. High Bias: High bias value means more assumptions are taken to build the target function. In this case, the model will not match the training dataset closely.

Ways to reduce high bias in Machine Learning Use a more complex model: One of the main reasons for high bias is the very simplified model. It will not be able to capture the complexity of the data. In such cases, we can make our mode more complex by increasing the number of hidden layers in the case of a deep neural network. Or we can use a more complex model like Polynomial regression for non-linear datasets, CNN for image processing, and RNN for sequence learning. Increase the number of features: By adding more features to train the dataset will increase the complexity of the model. And improve its ability to capture the underlying patterns in the data. Reduce Regularization of the model: Regularization techniques such as L1 or L2 regularization can help to prevent overfitting and improve the generalization ability of the model. if the model has a high bias, reducing the strength of regularization or removing it altogether can help to improve its performance. Increase the size of the training data: Increasing the size of the training data can help to reduce bias by providing the model with more examples to learn from the dataset.

Variance Variance is the measure of spread in data from its mean position. In machine learning variance is the amount by which the performance of a predictive model changes when it is trained on different subsets of the training data. More specifically, variance is the variability of the model that how much it is sensitive to another subset of the training dataset. i.e. how much it can adjust on the new subset of the training dataset. Let Y be the actual values of the target variable, and Y^ be the predicted values of the target variable. Then the variance of a model can be measured as the expected value of the square of the difference between predicted values and the expected value of the predicted values. Variance=E[(Y^–E[Y^]) 2 ] where E[Yˉ] is the expected value of the predicted values. Here expected value is averaged over all the training data.

Contd.. Variance errors are either low or high-variance errors. Low variance: Low variance means that the model is less sensitive to changes in the training data and can produce consistent estimates of the target function with different subsets of data from the same distribution . This is the case of underfitting when the model fails to generalize on both training and test data. High variance: High variance means that the model is very sensitive to changes in the training data and can result in significant changes in the estimate of the target function when trained on different subsets of data from the same distribution. This is the case of overfitting when the model performs well on the training data but poorly on new, unseen test data. It fits the training data too closely that it fails on the new training dataset.

Ways to Reduce the reduce Variance in Machine Learning Cross-validation: By splitting the data into training and testing sets multiple times, cross-validation can help identify if a model is overfitting or underfitting and can be used to tune hyperparameters to reduce variance. Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and it can reduce the variance error. Regularization: We can use L1 or L2 regularization to reduce variance in machine learning models Ensemble methods: It will combine multiple models to improve generalization performance. Bagging, boosting, and stacking are common ensemble methods that can help reduce variance and improve generalization performance. Simplifying the model: Reducing the complexity of the model, such as decreasing the number of parameters or layers in a neural network, can also help reduce variance and improve generalization performance. Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of the deep learning model when the performance on the validation set stops improving.

Different Combinations of Bias-Variance There can be four combinations between bias and variance. High Bias, Low Variance: A model with high bias and low variance is said to be underfitting. High Variance, Low Bias: A model with high variance and low bias is said to be overfitting. High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on average. Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data (low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.

Maximum Likelihood Estimation (MLE) Maximum Likelihood Estimation (MLE) is a powerful statistical method used to determine the parameters of a model that maximize the likelihood of observing the given data. In machine learning, MLE is employed to find the optimal values for model parameters, such as weights and biases in neural networks or coefficients in regression models. Likelihood Function: This function represents the probability of observing the given data, given a specific set of model parameters. Parameter Estimation: MLE aims to find the values of the model parameters that maximize the likelihood function. Optimization Algorithms: Various optimization algorithms, such as gradient descent, are used to iteratively update the parameter values to maximize the likelihood.

Steps Involved in MLE Define the Likelihood Function: The likelihood function is typically expressed as the product of the probabilities of individual data points, assuming they are independent. Take the Log-Likelihood: To simplify calculations and numerical stability, the log-likelihood function is often used. Optimize the Log-Likelihood: Optimization algorithms are employed to find the parameter values that maximize the log-likelihood. Evaluate the Model: The estimated parameters are used to evaluate the model's performance on new, unseen data.

the role that Likelihood value plays in determining the optimum PDF curve Maximum Likelihood Estimation Method In the above-given equation, we are trying to determine the likelihood value by calculating the joint probability of each X i taking a specific value x i involved in a particular PDF. Now, since we are looking for the maximum likelihood value, we differentiate the likelihood function w.r.t P and set it to 0 as given below. ∂ L/ ∂ P =0

This way, we can obtain the PDF curve that has the maximum likelihood of fit over the random sample data. But, if you observe carefully, differentiating L w.r.t P is not an easy task as all the probabilities in the likelihood function is a product. Hence, the calculation becomes computationally expensive. To solve this, we take the log of the Likelihood function L.

Applications of MLE in Machine Learning Training Neural Networks: MLE is a fundamental technique used to train neural networks. The weights and biases of the network are adjusted to maximize the likelihood of the training data. Logistic Regression: MLE is used to estimate the coefficients of a logistic regression model, which predicts the probability of a binary outcome. Gaussian Mixture Models: MLE is employed to estimate the parameters of Gaussian mixture models, which represent data as a mixture of Gaussian distributions. Hidden Markov Models: MLE is used to estimate the transition probabilities and emission probabilities of hidden Markov models, which model sequential data

Bayesian statistics Bayesian statistics is a powerful framework for statistical inference that incorporates prior knowledge or beliefs about a phenomenon into the analysis. It's based on Bayes' theorem, which provides a way to update our beliefs about a hypothesis or parameter in light of new evidence. Prior Probability: This represents our initial belief or knowledge about a parameter or hypothesis before observing any data. It's often subjective and based on past experience, expert opinion, or common sense. Likelihood: This quantifies the probability of observing the data given a specific value of the parameter. It's derived from the statistical model assumed for the data. Posterior Probability: This represents our updated belief about the parameter or hypothesis after considering both the prior probability and the likelihood of the observed data. It's calculated using Bayes' theorem.

Bayes’ Theorem is used to determine the conditional probability of an event. It is used to find the probability of an event, based on prior knowledge of conditions that might be related to that event. Bayes’ theorem is “ The conditional probability of an event A, given the occurrence of another event B, is equal to the product of the event of B, given A and the probability of A divided by the probability of event B.” i.e. P(A|B) = P(B|A)P(A) / P(B) where, P(A) and P(B) are the probabilities of events A and B P(A|B) is the probability of event A when event B happens P(B|A) is the probability of event B when A happens

Applications of Bayesian Statistics Machine Learning: Bayesian methods are widely used in machine learning for tasks such as classification, regression, and dimensionality reduction. Medical Diagnosis: Bayesian inference is used to update the probability of a disease given the results of medical tests. Finance: Bayesian methods are employed in risk assessment, portfolio management, and option pricing. Natural Language Processing: Bayesian techniques are used for tasks such as text classification, spam filtering, and machine translation.

Advantages of Bayesian Statistics Incorporates Prior Knowledge: Bayesian methods allow us to incorporate prior knowledge or beliefs into the analysis, which can lead to more accurate and robust inferences. Flexibility: Bayesian methods can handle a wide range of statistical models and data types. Interpretability: Bayesian inference provides a natural way to quantify uncertainty and make probabilistic predictions.

Example Example: Coin Toss Suppose we have a coin and we want to estimate the probability of getting heads ( θ). We toss the coin 10 times and observe 7 heads and 3 tails. Using Bayesian inference, we can update our prior belief about θ based on the observed data. Prior: Let's assume a uniform prior distribution for θ, meaning that all values between 0 and 1 are equally likely. Likelihood: The likelihood of observing 7 heads and 3 tails given θ is given by the binomial distribution. Posterior: Using Bayes' theorem, we can calculate the posterior distribution for θ. This distribution will be centered around 0.7, reflecting the observed data, but it will also be influenced by the prior distribution.

Stochastic Gradient Descent (SGD) Stochastic Gradient Descent (SGD) is a powerful optimization algorithm commonly used in machine learning to train models efficiently, especially with large datasets. It's a variant of the gradient descent algorithm that processes training data in small batches or individual data points instead of the entire dataset at once. Gradient Descent: Gradient descent is an iterative optimization algorithm that aims to minimize a function (often the loss function in machine learning) by moving in the direction of the steepest descent. Stochastic Approximation: SGD approximates the true gradient using a single data point or a small batch of data. This introduces randomness into the optimization process, hence the term "stochastic."

SGD Working Initialization: Start with an initial set of model parameters (e.g., weights and biases in a neural network). Data Selection: Randomly select a single data point or a small batch of data from the training set. Gradient Calculation: Compute the gradient of the loss function with respect to the selected data and the current model parameters. Parameter Update: Update the model parameters by moving them in the opposite direction of the calculated gradient, scaled by a learning rate. Iteration: Repeat steps 2-4 for multiple iterations until the model converges or reaches a predefined stopping criterion.

Advantages & Disadvantages Advantages of SGD: Efficiency: SGD can be significantly faster than traditional gradient descent, especially for large datasets, as it processes data in smaller chunks. Online Learning: SGD can be used for online learning scenarios where data arrives sequentially. Simplicity: The algorithm is relatively simple to implement and understand. Disadvantages of SGD: Noisy Updates: The randomness introduced by using small batches or single data points can make the optimization process noisier and less stable. Learning Rate Sensitivity: The choice of learning rate can significantly impact the convergence and performance of SGD. Potential for Oscillations: SGD can sometimes oscillate around the minimum of the loss function due to the noisy updates

Variants of SGD Mini-batch SGD: Uses small batches of data instead of single data points for each update, which can improve stability and reduce noise. Momentum SGD: Incorporates momentum to smooth out the updates and accelerate convergence. Adam: Combines the advantages of adaptive learning rate methods (like AdaGrad ) with the momentum technique.

0aca523d-6839-4a24-93a6-90899a971882.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

0aca523d-6839-4a24-93a6-90899a971882.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx