FDP and Machine Learning for faculty .pptx

TamilSelvi165 1 views 59 slides Oct 12, 2025
Slide 1
Slide 1 of 59
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59

About This Presentation

Machine learning


Slide Content

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/362814047 Mathematics and Statistics Behind Machine Learning Present ation · A ugust 2022 DOI: 10.13140/RG.2.2.15915.92969 CI T A TIONS READS 1,465 1 author: R. C. Mit t al Jaypee Insti tu t e of In f ormation T echnolo g y 164 PUBLIC A TIONS 4,039 CI T A TIONS SEE PROFILE All content following this page was uploaded by R. C. Mittal on 20 August 2022. The user has requested enhancement of the downloaded file.

Mathematics and Statistics Behind Machine Learning R. C. Mittal Department of Mathematics, Jaypee Institute of Information Technology Sector 62 NOIDA (U. P.) 8/20/2022 FDP On Machine Learning 1

8/20/2022 FDP On Machine Learning 2 C o n t e n t s Introduction Artificial Intelligence Machine Learning Statistics and Probability in Machine Learning Linear Algebra in Machine Learning in Machine Learning Calculus in Machine Learning Optimization in Machine Learning Different Machine Learning Models References

I nt r o du c t i o n 8/20/2022 FDP On Machine Learning 3

8/20/2022 FDP On Machine Learning 4

Artificial Intelligence Intelligence can be defined as the ability to learn new skills and improve it with experience. Artificial Intelligence – It is not real intelligence. The ability has been acquired through the intelligence of some one else. The word Artificial Intelligence was first coined in 1955 by John McCarthy to describe science and engineering of making intelligent machines. 8/20/2022 FDP On Machine Learning 5

8/20/2022 FDP On Machine Learning 6 Some History of Development of AI Turing Test – Great Mathematician and Computer Scientist Alan Turing proposed a test to check if a machine can track human thinking then it has intelligence. Unimate – First industrial robot developed in 1961. Deep Blue – In 1997, a chess playing computer from IBM, defeated then world champion Gary Kasparov.

Eugene – A Chatbot passed Turing Test in 2014, showing human like behavior. Alphago – Google’s AI Alphago beats world champion Ke Je in board game Go . This shows that machines can emulate like human, learn with experience and take decisions. 8/20/2022 FDP On Machine Learning 7

8/20/2022 FDP On Machine Learning 8

8/20/2022 FDP On Machine Learning 9

8/20/2022 FDP On Machine Learning 10

Machine Learning Machine learning gives computers the ability to learn without being explicitly programmed for the task at hand. 8/20/2022 FDP On Machine Learning 11

8/20/2022 FDP On Machine Learning 12

8/20/2022 FDP On Machine Learning 13

Statistics and Probability in Machine Learning Mean or Expectation Value Let X be a random variable with n observations, then the mean value of X is given by = The mean or expectation value is a measure of central tendency. 8/20/2022 FDP On Machine Learning 14

Variance and Standard Deviation Let X be a random variable with N observations, then the variance of X is given by: The standard deviation is the square root of the variance and is a measure of uncertainty or volatility. 8/20/2022 FDP On Machine Learning 15

Central Limit Theorem The Central Limit Theorem(CLT) states that the sample mean of a probability distribution sample is a random variable with a mean value deviation given by population 8/20/2022 FDP On Machine Learning 16 g i v e n b y p o p u l a t i o n m e an a n d s t a n d a r d standard deviation divided by square root of N, where N is the sample size.

Let be the population mean, and the population standard deviation. If we draw a small sample from the population with size N, then according to the CLT, the sample mean is given as = and the sample standard deviation is given by 8/20/2022 FDP On Machine Learning 17

Correlation and Covariance Correlation and covariance are measures of co- movement in a dataset. To quantify the degree of correlation between features , we can compute the covariance matrix using the equation: 8/20/2022 FDP On Machine Learning 18

8/20/2022 FDP On Machine Learning 19 Probability Distribution Even though we always assume that variables or features in a data set are normally distributed , it is important to plot the probability distribution to visualize how a feature is distributed. For example, using the heights in a data set , we can compute the probability distribution of all heights in the dataset , as shown in Figure on the next page.

8/20/2022 FDP On Machine Learning 20

Bayes’ Theorem Bayes’ theorem plays an important role in binary classification problems. It is used to interpret the output from a binary classification algorithm. Bayes’ theorem states that the conditional probabilities of two events A and B are related by 8/20/2022 FDP On Machine Learning 21

Linear Algebra in Machine Learning 8/20/2022 FDP On Machine Learning 22

8/20/2022 FDP On Machine Learning 23

Calculus in Machine Learning 8/20/2022 FDP On Machine Learning 24

Optimization in Machine Learning 8/20/2022 FDP On Machine Learning 25

8/20/2022 FDP On Machine Learning 26

Machine Learning Models In machine learning an algorithm learns with experience and modify itself. T h is is d o n e w i t h t h e h e lp o f g i v i n g o b t ai n e d output as input. Mathematically it is prediction the value of a depended variable for given values of independent variables. This is accomplished with the help of different kind of predicting models used for a particular type of situation. 8/20/2022 FDP On Machine Learning 27

Linear Regression 8/20/2022 FDP On Machine Learning 28

Logistic Regression 8/20/2022 FDP On Machine Learning 29

Logistic Regression is used when outcome is categorical. It is linear but used for classification. 8/20/2022 FDP On Machine Learning 30

Logistic Function(Sigmoid Function)- f(x) = It maps real values between and 1, indicating probabilities. We assume a linear regression of type log [ ] = b + b 1 x 1 +……+b n x n Here y = 8/20/2022 FDP On Machine Learning 31

8/20/2022 FDP On Machine Learning 32

Support Vector Machine 8/20/2022 FDP On Machine Learning 33

A simple linear SVM classifier works by making a straight line between two classes. That means all of the data points on one side of the line will represent a category and the data points on the other side of the line will be put into a different category. 8/20/2022 FDP On Machine Learning 34

Decision Tree 8/20/2022 FDP On Machine Learning 35

8/20/2022 FDP On Machine Learning 36 Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset , branches represent the decision rules and each leaf node represents the outcome . In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node . Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.

8/20/2022 FDP On Machine Learning 37

K Nearest Neighbor 8/20/2022 FDP On Machine Learning 38

8/20/2022 FDP On Machine Learning 39 “ Birds of a feather flock together .” The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. Use the concept of norm or metric between two things to decide their closeness. Commonly used are Euclidean, Manhattan or Hamming distance To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before.

8/20/2022 FDP On Machine Learning 40 The K-Nearest Neighbor work as follows Step-1: Select the number K of the neighbors Step-2: Calculate the Euclidean distance between data points. Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. Step-4: Among these K neighbors, count the number of the data points in each category. Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. Train the data in this manner.

8/20/2022 FDP On Machine Learning 41

K Means 8/20/2022 FDP On Machine Learning 42

8/20/2022 FDP On Machine Learning 43 K-Means Clustering is an unsupervised machine learning algorithm which groups the unlabeled dataset into different clusters. Here K defines the number of pre- defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. It is an iterative algorithm that divides the unlabeled dataset into K different clusters in such a way that each dataset belongs only one group that has similar properties. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data points and their corresponding clusters.

8/20/2022 FDP On Machine Learning 44

Principal Component Analysis 8/20/2022 FDP On Machine Learning 45

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set. Following steps are used in PCA (1) Normalization of Data – Compute the mean Now normalize the data in [0,1] by using formula z = 8/20/2022 FDP On Machine Learning 46

2. Computation Covariance Matrix- The aim of this step is to understand how the variables of the input data set are varying from the mean with respect to each other, or in other words, to see if there is any relationship between them. Because sometimes, variables are highly correlated in such a way that they contain redundant information. So, in order to identify these correlations, we compute the covariance matrix 8/20/2022 FDP On Machine Learning 47

This is a symmetric matrix. Cov(x, x) = var(x) etc. 3. Computation of Eigen values and Eigen Vectors- Now compute the eigen values and corresponding eigen vectors of the covariance matrix (4). You will find some eigen values are very small in magnitude . Precisely these eigen values can be neglected. Thus reducing the dimension. Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables. These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most of the information within the initial variables is squeezed or compressed into the first components . 8/20/2022 FDP On Machine Learning 48

Naïve Bayes 8/20/2022 FDP On Machine Learning 49

8/20/2022 FDP On Machine Learning 50 Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. It is mainly used in text classification that includes a high- dimensional training dataset. It is a probabilistic classifier, which means it predicts on the basis of the probability of an object. The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as: Naïve : It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that it is an apple without depending on each other. Bayes : It is called Bayes because it depends on the principle of Bayes theorem .

8/20/2022 FDP On Machine Learning 51 Working of Naïve Bayes' Classifier can be understood with the help of the below example: Suppose we have a dataset of weather conditions and corresponding target variable " Play ". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps: Convert the given dataset into frequency tables. Generate Likelihood table by finding the probabilities of given features. Now, use Bayes theorem to calculate the posterior probability.

Artificial Neural Network 8/20/2022 FDP On Machine Learning 52

A neural network (NN) is a powerful tool for approximating bounded continuous functions in machine learning. An artificial neural network (ANN) is a computing system inspired by the complex biological neural networks (NNs) constituting animal brains. The original purpose of constructing an ANN was to mimic the problem solving process that occurs in the human brain. NNs consist of artificial neurons (simplified biological neurons) and connections (also called edges) of artificial neurons (simplified synapses) . 8/20/2022 FDP On Machine Learning 53

8/20/2022 FDP On Machine Learning 54 The output of each neuron is a non-linear bounded function of the weighted sum of its inputs. The neurons are usually arranged into layers, where connections are established between layers.

8/20/2022 FDP On Machine Learning 55

R e f e r e n c e s J.M. Phillips “ Mathematical Foundations for Data Analysis “ (2018) R. C. Mittal “Some Applications of Linear Algebra in Computer Science” https:// www.researchgate.net/publication/3 52180145_Some_ApplicAtionS_of_lineAr_Alg ebrA_in_computer_Science G. Strang , “Introduction to Linear Algebra” MIT Publication (2016)

8/20/2022 FDP On Machine Learning 57 R. C. Mittal, “Essentials of Mathematics in Data Science” https:// www.researchgate.net/publication/3597 30625_Essentials_of_Mathematics_in_Data_Scie nce V. S. Joshi, “ Artificial Intelligence and Machine Learning: A Primer”, e-book CBSE and IBM Notes on Artificial Intelligence Michael U. Gutmann, “Pen & Paper Exercises in Machine Learning” e-book

8/20/2022 FDP On Machine Learning 58 Statistics and Probability for Data Science available at www.DataCleanic.ml Source of Some of Pictorial Slides – Machinelearning Group in www.linkedin.com (followed Hashtags)
Tags