Introduction and Basics of Machine Learning.pptx

GoodReads1 217 views 50 slides Oct 04, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

This presentation covers the basics concepts ang techniques in Machine Learning


Slide Content

Machine Learning Techniques Dr. M. Lilly Florence Professor Adhiyamaan College of Engineering (Autonomous) Hosur , Tamilnadu

Content Learning Types of Machine Learning Supervised Learning The Brain and the Neuron Design a Learning System Perspectives and Issues in Machine Learning Concept Learning as Task Concept Learning as Search Finding a Maximally Specific Hypothesis Version Spaces and the Candidate Elimination Algorithm Linear Discriminants Perceptron Linear Separability Linear Regression

Learning It is said that the term machine learning was first coined by Arthur Lee Samuel, a pioneer in the AI field, in 1959. “Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed. — Arthur L. Samuel, AI pioneer, 1959”. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. — Tom Mitchell, Machine Learning Professor at Carnegie Mellon University To illustrate this quote with an example, consider the problem of recognizing handwritten digits: Task T: classifying handwritten digits from images Performance measure P : percentage of digits classified correctly Training experience E: dataset of digits given classifications,

Why “Learn” ? 4 Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)

Basic components of learning process Four components, namely, data storage, abstraction, generalization and evaluation. 1. Data storage - Facilities for storing and retrieving huge amounts of data are an important component of the learning process 2. Abstraction - Abstraction is the process of extracting knowledge about stored data. This involves creating general concepts about the data as a whole. The creation of knowledge involves application of known models and creation of new models. The process of fitting a model to a dataset is known as training. When the model has been trained, the data is transformed into an abstract form that summarizes the original information. 3. Generalization - The term generalization describes the process of turning the knowledge about stored data into a form that can be utilized for future action. 4. Evaluation - It is the process of giving feedback to the user to measure the utility of the learn

Learning Model The basic idea of Learning models has divided into three categories. Using a Logical expression. (Logical models) Using the Geometry of the instance space. (Geometric models) Using Probability to classify the instance space. (Probabilistic models

Applications of Machine Learning Email spam detection • Face detection and matching (e.g., iPhone X) • Web search (e.g., DuckDuckGo, Bing, Google) • Sports predictions • Post office (e.g., sorting letters by zip codes) • ATMs (e.g., reading checks) • Credit card fraud • Stock predictions • Smart assistants (Apple Siri, Amazon Alexa, . . . ) • Product recommendations (e.g., Netflix, Amazon) • Self-driving cars (e.g., Uber, Tesla) • Language translation (Google translate) • Sentiment analysis • Drug design • Medical diagnose

Types of Machine Learning The three broad categories of machine learning are summarized in the following figure: Supervised learning Unsupervised learning and Reinforcement learning Evolutionary learning

Types of Machine Learning

Supervised learning Supervised learning is the subcategory of machine learning that focuses on learning a classification or regression model, that is, learning from labeled training data. Classification Regression

The Brain and the Neuron Brain Nerve Cell-Neuron Each neuron is typically connected to thousands of other neurons, so that it is estimated that there are about 100 trillion (= 1014) synapses within the brain. After firing, the neuron must wait for some time to recover its energy (the refractory period) before it can fire again Hebb’s Rule - rule says that the changes in the strength of synaptic connections are proportional to the correlation in the firing of the two connecting neurons. So if two neurons consistently fire simultaneously, then any connection between them will change in strength, becoming stronger. There are other names for this idea that synaptic connections between neurons and assemblies of neurons can be formed when they fire together and can become stronger. It is also known as long-term potentiation and neural plasticity, and it does appear to have correlates in real brains.

The Brain and the Neuron McCulloch and Pitts Neurons Studying neurons isn’t actually that easy, able to extract the neuron from the brain, and then keep it alive so that you can see how it reacts in controlled circumstances.

Designing a Learning System The design choices has the following key components: 1. Type of training experience – Direct/Indirect, Supervised/Unsupervised 2. Choosing the Target Function 3. Choosing a representation for the Target Function 4. Choosing an approximation algorithm for the Target Function 5. The final Design

Designing a Learning System Real-world examples of machine learning problems include “ Is this cancer?”, “ What is the market value of this house?”, “ Which of these people are good friends with each other?”, “ Will this rocket engine explode on take off?”, “ Will this person like this movie?”, “ Who is this?”, “What did you say?”, and “ How do you fly this thing?” All of these problems are excellent targets for an ML project; in fact ML has been applied to each of them with great success.

PERSPECTIVES AND ISSUES IN MACHINE LEARNING Issues in Machine Learning What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems and representations? How much training data is sufficient? What general bounds can be found to relate the confidence in learned hypotheses to the amount of training experience and the character of the learner's hypothesis space? When and how can prior knowledge held by the learner guide the process of generalizing from examples? Can prior knowledge be helpful even when it is only approximately correct? What is the best strategy for choosing a useful next training experience , and how does the choice of this strategy alter the complexity of the learning problem? What is the best way to reduce the learning task to one or more function approximation problems? Put another way, what specific functions should the system attempt to learn? Can this process itself be automated? How can the learner automatically alter its representation to improve its ability to represent and learn the target function?

Enjoysport examples

Concept Learning as Search The goal of this search is to find the hypothesis that best fits the training examples. By selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ever represent and therefore can ever learn . Consider, for example,the instances X and hypotheses H in the EnjoySport learning task. In learning as a search problem, it is natural that our study of learning algorithms will examine the different strategies for searching the hypothesis space .

Concept Learning as Search General-to-Specific Ordering of Hypotheses To illustrate the general-to-specific ordering, consider the two hypotheses h1 = (Sunny, ?, ?, Strong, ?, ?) h2=( Sunny ,?,?,?,?,?) Now consider the sets of instances that are classified positive by hl and by h2 . Because h2 imposes fewer constraints on the instance, it classifies more instances as positive. In fact, any instance classified positive by h1 will also be classified positive by h2. Therefore, we say that h2 is more general than h1. First , for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1

Concept Learning as Search

Finding a Maximally Specific Hypothesis Three main concepts; Concept Learning General Hypothesis Specific Hypothesis A hypothesis,  h , is a most specific hypothesis if it covers none of the negative examples and there is no other hypothesis  h ′ that covers no negative examples, such that  h  is strictly more general than  h ′.

Finding a Maximally Specific Hypothesis Find-S algorithm finds the most specific hypothesis that fits all the positive examples . Find-S algorithm moves from the most specific hypothesis to the most general hypothesis . Important Representation : ?  indicates that any value is acceptable for the attribute. specify a single required value ( e.g., Cold ) for the attribute. ϕ indicates that no value is acceptable. The most  general hypothesis  is represented by:  {?, ?, ?, ?, ?, ?} The most  specific hypothesis  is represented by :  {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Find-S Algorithm Steps Involved In Find-S : Start with the most specific hypothesis. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ} Take the next example and if it is negative, then no changes occur to the hypothesis. If the example is positive and we find that our initial hypothesis is too specific then we update our current hypothesis to general condition. Keep repeating the above steps till all the training examples are complete. After we have completed all the training examples we will have the final hypothesis which can used to classify the new examples.

First we consider the hypothesis to be more specific hypothesis. Hence, our hypothesis would be : h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}   Consider example 1 : The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial hypothesis is more specific and we have to generalize it for this example. Hence, the hypothesis becomes : h = { GREEN, HARD, NO, WRINKLED } Consider example 2 : Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the same. h = { GREEN, HARD, NO, WRINKLED }

Consider example 3 : Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the same. h = { GREEN, HARD, NO, WRINKLED } Consider example 4 : The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every single attribute with the initial data and if any mismatch is found we replace that particular attribute with general case ( ” ? ” ). After doing the process the hypothesis becomes : h = { ?, HARD, NO, WRINKLED } Consider example 5 : The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare every single attribute with the initial data and if any mismatch is found we replace that particular attribute with general case ( ” ? ” ). After doing the process the hypothesis becomes : h = { ?, ?, ?, ? } Since we have reached a point where all the attributes in our hypothesis have the general condition, the example 6 and example 7 would result in the same hypothesizes with all general attributes. h = { ?, ?, ?, ? } Hence, for the given data the final hypothesis would be : Final Hyposthesis : h = { ?, ?, ?, ? }

Version Space A version space is a hierarchical representation of knowledge that enables you to keep track of all the useful information supplied by a sequence of learning examples without remembering any of the examples. The version space method is a concept learning process accomplished by managing multiple models within a version space. Definition (Version space) . A concept is complete if it covers all positive examples. A concept is consistent if it covers none of the negative examples. The version space is the set of all complete and consistent concepts. This set is convex and is fully defined by its least and most general elements.

Version Space To represent the version space is simply to list all of its members . This leads to a simple learning algorithm, which we might call the LIST-THEN ELIMINATE algorithm The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples.

Version Space

Candidate Elimination Learning Algorithm

Origin Manufacturer Color Decade Type Example Type Japan Honda Blue 1980 Economy Positive Japan Toyota Green 1970 Sports Negative Japan Toyota Blue 1990 Economy Positive USA Chrysler Red 1980 Economy Negative Japan Honda White 1980 Economy Positive Problem 1: Learning the concept of "Japanese Economy Car" Features : ( Country of Origin, Manufacturer, Color, Decade, Type )

Solution: 1. Positive Example: (Japan, Honda, Blue, 1980, Economy) Initialize G to a singleton set that includes everything . G = { (?, ?, ?, ?, ?) } Initialize S to a singleton set that includes the first positive example . S = { (Japan, Honda, Blue, 1980, Economy) }

Linear Discriminant Analysis In 1936, Ronald A.Fisher formulated Linear Discriminant first time and showed some practical uses as a classifier, it was described for a 2-class problem, and later generalized as ‘Multi-class Linear Discriminant Analysis’ or ‘Multiple Discriminant Analysis’ by C.R.Rao in the year 1948. Linear Discriminant Analysis is the most commonly used dimensionality reduction technique in supervised learning. Basically, it is a preprocessing step for pattern classification and machine learning applications. It projects the dataset into moderate dimensional-space with a genuine class of separable features that minimize overfitting and computational costs.

Working of Linear Discriminant Analysis - Assumptions Every feature either be variable, dimension, or attribute in the dataset has gaussian distribution, i.e , features have a bell-shaped curve. Each feature holds the same variance, and has varying values around the mean with the same amount on average. Each feature is assumed to be sampled randomly. Lack of multicollinearity in independent features and there is an increment in correlations between independent features and the power of prediction decreases.

LDA achieve this via three step process; First step: To compute the separate ability amid various classes,i.e , the distance between the mean of different classes, that is also known as between-class variance

Second Step : To compute the distance among the mean and sample of each class , that is also known as the within class variance .

Third step : To create the lower dimensional space that maximizes the between class variance and minimizes the within class variance. Assuming P as the lower dimensional space projection that is known as Fisher’s criterion .

For example, LDA can be used as a classification task for speech recognition, microarray data classification, face recognition, image retrieval, bioinformatics, biometrics, chemistry, etc . https:// people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html

Perceptron Perceptron is a single layer neural network and a multi-layer perceptron is called Neural Networks. Perceptron is a linear classifier (binary). Also, it is used in supervised learning.

The perceptron consists of 4 parts. Input values or One input layer Weights and Bias Net sum Activation Function

The perceptron works on these simple steps All the inputs x are multiplied with their weights w. Let’s call it k .

b. Add all the multiplied values and call them Weighted Sum.

C. Apply that weighted sum to the correct Activation Function.

Why do we need Weights and Bias? Weights shows the strength of the particular node. A bias value allows you to shift the activation function curve up or down.

Why do we need Activation Function? In short, the activation functions are used to map the input between the required values like (0, 1) or (-1, 1).