This presentation covers the basics concepts ang techniques in Machine Learning
Size: 2.09 MB
Language: en
Added: Oct 04, 2024
Slides: 50 pages
Slide Content
Machine Learning Techniques Dr. M. Lilly Florence Professor Adhiyamaan College of Engineering (Autonomous) Hosur , Tamilnadu
Content Learning Types of Machine Learning Supervised Learning The Brain and the Neuron Design a Learning System Perspectives and Issues in Machine Learning Concept Learning as Task Concept Learning as Search Finding a Maximally Specific Hypothesis Version Spaces and the Candidate Elimination Algorithm Linear Discriminants Perceptron Linear Separability Linear Regression
Learning It is said that the term machine learning was first coined by Arthur Lee Samuel, a pioneer in the AI field, in 1959. “Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed. — Arthur L. Samuel, AI pioneer, 1959”. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. — Tom Mitchell, Machine Learning Professor at Carnegie Mellon University To illustrate this quote with an example, consider the problem of recognizing handwritten digits: Task T: classifying handwritten digits from images Performance measure P : percentage of digits classified correctly Training experience E: dataset of digits given classifications,
Why “Learn” ? 4 Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)
Basic components of learning process Four components, namely, data storage, abstraction, generalization and evaluation. 1. Data storage - Facilities for storing and retrieving huge amounts of data are an important component of the learning process 2. Abstraction - Abstraction is the process of extracting knowledge about stored data. This involves creating general concepts about the data as a whole. The creation of knowledge involves application of known models and creation of new models. The process of fitting a model to a dataset is known as training. When the model has been trained, the data is transformed into an abstract form that summarizes the original information. 3. Generalization - The term generalization describes the process of turning the knowledge about stored data into a form that can be utilized for future action. 4. Evaluation - It is the process of giving feedback to the user to measure the utility of the learn
Learning Model The basic idea of Learning models has divided into three categories. Using a Logical expression. (Logical models) Using the Geometry of the instance space. (Geometric models) Using Probability to classify the instance space. (Probabilistic models
Applications of Machine Learning Email spam detection • Face detection and matching (e.g., iPhone X) • Web search (e.g., DuckDuckGo, Bing, Google) • Sports predictions • Post office (e.g., sorting letters by zip codes) • ATMs (e.g., reading checks) • Credit card fraud • Stock predictions • Smart assistants (Apple Siri, Amazon Alexa, . . . ) • Product recommendations (e.g., Netflix, Amazon) • Self-driving cars (e.g., Uber, Tesla) • Language translation (Google translate) • Sentiment analysis • Drug design • Medical diagnose
Types of Machine Learning The three broad categories of machine learning are summarized in the following figure: Supervised learning Unsupervised learning and Reinforcement learning Evolutionary learning
Types of Machine Learning
Supervised learning Supervised learning is the subcategory of machine learning that focuses on learning a classification or regression model, that is, learning from labeled training data. Classification Regression
The Brain and the Neuron Brain Nerve Cell-Neuron Each neuron is typically connected to thousands of other neurons, so that it is estimated that there are about 100 trillion (= 1014) synapses within the brain. After firing, the neuron must wait for some time to recover its energy (the refractory period) before it can fire again Hebb’s Rule - rule says that the changes in the strength of synaptic connections are proportional to the correlation in the firing of the two connecting neurons. So if two neurons consistently fire simultaneously, then any connection between them will change in strength, becoming stronger. There are other names for this idea that synaptic connections between neurons and assemblies of neurons can be formed when they fire together and can become stronger. It is also known as long-term potentiation and neural plasticity, and it does appear to have correlates in real brains.
The Brain and the Neuron McCulloch and Pitts Neurons Studying neurons isn’t actually that easy, able to extract the neuron from the brain, and then keep it alive so that you can see how it reacts in controlled circumstances.
Designing a Learning System The design choices has the following key components: 1. Type of training experience – Direct/Indirect, Supervised/Unsupervised 2. Choosing the Target Function 3. Choosing a representation for the Target Function 4. Choosing an approximation algorithm for the Target Function 5. The final Design
Designing a Learning System Real-world examples of machine learning problems include “ Is this cancer?”, “ What is the market value of this house?”, “ Which of these people are good friends with each other?”, “ Will this rocket engine explode on take off?”, “ Will this person like this movie?”, “ Who is this?”, “What did you say?”, and “ How do you fly this thing?” All of these problems are excellent targets for an ML project; in fact ML has been applied to each of them with great success.
PERSPECTIVES AND ISSUES IN MACHINE LEARNING Issues in Machine Learning What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems and representations? How much training data is sufficient? What general bounds can be found to relate the confidence in learned hypotheses to the amount of training experience and the character of the learner's hypothesis space? When and how can prior knowledge held by the learner guide the process of generalizing from examples? Can prior knowledge be helpful even when it is only approximately correct? What is the best strategy for choosing a useful next training experience , and how does the choice of this strategy alter the complexity of the learning problem? What is the best way to reduce the learning task to one or more function approximation problems? Put another way, what specific functions should the system attempt to learn? Can this process itself be automated? How can the learner automatically alter its representation to improve its ability to represent and learn the target function?
Enjoysport examples
Concept Learning as Search The goal of this search is to find the hypothesis that best fits the training examples. By selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ever represent and therefore can ever learn . Consider, for example,the instances X and hypotheses H in the EnjoySport learning task. In learning as a search problem, it is natural that our study of learning algorithms will examine the different strategies for searching the hypothesis space .
Concept Learning as Search General-to-Specific Ordering of Hypotheses To illustrate the general-to-specific ordering, consider the two hypotheses h1 = (Sunny, ?, ?, Strong, ?, ?) h2=( Sunny ,?,?,?,?,?) Now consider the sets of instances that are classified positive by hl and by h2 . Because h2 imposes fewer constraints on the instance, it classifies more instances as positive. In fact, any instance classified positive by h1 will also be classified positive by h2. Therefore, we say that h2 is more general than h1. First , for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1
Concept Learning as Search
Finding a Maximally Specific Hypothesis Three main concepts; Concept Learning General Hypothesis Specific Hypothesis A hypothesis, h , is a most specific hypothesis if it covers none of the negative examples and there is no other hypothesis h ′ that covers no negative examples, such that h is strictly more general than h ′.
Finding a Maximally Specific Hypothesis Find-S algorithm finds the most specific hypothesis that fits all the positive examples . Find-S algorithm moves from the most specific hypothesis to the most general hypothesis . Important Representation : ? indicates that any value is acceptable for the attribute. specify a single required value ( e.g., Cold ) for the attribute. ϕ indicates that no value is acceptable. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?} The most specific hypothesis is represented by : {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
Find-S Algorithm Steps Involved In Find-S : Start with the most specific hypothesis. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ} Take the next example and if it is negative, then no changes occur to the hypothesis. If the example is positive and we find that our initial hypothesis is too specific then we update our current hypothesis to general condition. Keep repeating the above steps till all the training examples are complete. After we have completed all the training examples we will have the final hypothesis which can used to classify the new examples.
First we consider the hypothesis to be more specific hypothesis. Hence, our hypothesis would be : h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ} Consider example 1 : The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial hypothesis is more specific and we have to generalize it for this example. Hence, the hypothesis becomes : h = { GREEN, HARD, NO, WRINKLED } Consider example 2 : Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the same. h = { GREEN, HARD, NO, WRINKLED }
Consider example 3 : Here we see that this example has a negative outcome. Hence we neglect this example and our hypothesis remains the same. h = { GREEN, HARD, NO, WRINKLED } Consider example 4 : The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare every single attribute with the initial data and if any mismatch is found we replace that particular attribute with general case ( ” ? ” ). After doing the process the hypothesis becomes : h = { ?, HARD, NO, WRINKLED } Consider example 5 : The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare every single attribute with the initial data and if any mismatch is found we replace that particular attribute with general case ( ” ? ” ). After doing the process the hypothesis becomes : h = { ?, ?, ?, ? } Since we have reached a point where all the attributes in our hypothesis have the general condition, the example 6 and example 7 would result in the same hypothesizes with all general attributes. h = { ?, ?, ?, ? } Hence, for the given data the final hypothesis would be : Final Hyposthesis : h = { ?, ?, ?, ? }
Version Space A version space is a hierarchical representation of knowledge that enables you to keep track of all the useful information supplied by a sequence of learning examples without remembering any of the examples. The version space method is a concept learning process accomplished by managing multiple models within a version space. Definition (Version space) . A concept is complete if it covers all positive examples. A concept is consistent if it covers none of the negative examples. The version space is the set of all complete and consistent concepts. This set is convex and is fully defined by its least and most general elements.
Version Space To represent the version space is simply to list all of its members . This leads to a simple learning algorithm, which we might call the LIST-THEN ELIMINATE algorithm The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples.
Version Space
Candidate Elimination Learning Algorithm
Origin Manufacturer Color Decade Type Example Type Japan Honda Blue 1980 Economy Positive Japan Toyota Green 1970 Sports Negative Japan Toyota Blue 1990 Economy Positive USA Chrysler Red 1980 Economy Negative Japan Honda White 1980 Economy Positive Problem 1: Learning the concept of "Japanese Economy Car" Features : ( Country of Origin, Manufacturer, Color, Decade, Type )
Solution: 1. Positive Example: (Japan, Honda, Blue, 1980, Economy) Initialize G to a singleton set that includes everything . G = { (?, ?, ?, ?, ?) } Initialize S to a singleton set that includes the first positive example . S = { (Japan, Honda, Blue, 1980, Economy) }
Linear Discriminant Analysis In 1936, Ronald A.Fisher formulated Linear Discriminant first time and showed some practical uses as a classifier, it was described for a 2-class problem, and later generalized as ‘Multi-class Linear Discriminant Analysis’ or ‘Multiple Discriminant Analysis’ by C.R.Rao in the year 1948. Linear Discriminant Analysis is the most commonly used dimensionality reduction technique in supervised learning. Basically, it is a preprocessing step for pattern classification and machine learning applications. It projects the dataset into moderate dimensional-space with a genuine class of separable features that minimize overfitting and computational costs.
Working of Linear Discriminant Analysis - Assumptions Every feature either be variable, dimension, or attribute in the dataset has gaussian distribution, i.e , features have a bell-shaped curve. Each feature holds the same variance, and has varying values around the mean with the same amount on average. Each feature is assumed to be sampled randomly. Lack of multicollinearity in independent features and there is an increment in correlations between independent features and the power of prediction decreases.
LDA achieve this via three step process; First step: To compute the separate ability amid various classes,i.e , the distance between the mean of different classes, that is also known as between-class variance
Second Step : To compute the distance among the mean and sample of each class , that is also known as the within class variance .
Third step : To create the lower dimensional space that maximizes the between class variance and minimizes the within class variance. Assuming P as the lower dimensional space projection that is known as Fisher’s criterion .
For example, LDA can be used as a classification task for speech recognition, microarray data classification, face recognition, image retrieval, bioinformatics, biometrics, chemistry, etc . https:// people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html
Perceptron Perceptron is a single layer neural network and a multi-layer perceptron is called Neural Networks. Perceptron is a linear classifier (binary). Also, it is used in supervised learning.
The perceptron consists of 4 parts. Input values or One input layer Weights and Bias Net sum Activation Function
The perceptron works on these simple steps All the inputs x are multiplied with their weights w. Let’s call it k .
b. Add all the multiplied values and call them Weighted Sum.
C. Apply that weighted sum to the correct Activation Function.
Why do we need Weights and Bias? Weights shows the strength of the particular node. A bias value allows you to shift the activation function curve up or down.
Why do we need Activation Function? In short, the activation functions are used to map the input between the required values like (0, 1) or (-1, 1).