Instance Based Learning in machine learning

552 views 43 slides Jun 28, 2024
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

Instance based learning


Slide Content

Instance Based Learning

Unsupervised Learning: Customer Segmentation: The unsupervised learning puts the customers into different buying groups, hence the companies can know the different customer segments and advertise to the group to make them better targets. Market Basket Analysis: This also extends to suggestions. It facilitates the exploration of the relations between the products that are usually bought together. Think of a store putting peanut butter and jelly closer to each other because of this assumption.

Model-Based Learning: Model-based learning involves creating a mathematical model that can predict outcomes based on input data. The model is trained on a large dataset and then used to make predictions on new data. The model can be thought of as a set of rules that the machine uses to make predictions. The model is typically created using statistical algorithms such as linear regression, logistic regression, decision trees, and neural networks. Parameterized : if it learns using predefined mapped function

Instance-based learning: Sometimes called memory-based learning is a family of learning algorithms that, instead of performing explicit generalization, compares new problem instances with instances seen in training, which have been stored in memory. Instead of summarizing the training data into a model, uses the training instances themselves to make predictions.

Lazy Learning: Unlike eager learning algorithms (which generalize the training data into a model), instance-based learning algorithms delay processing until a prediction is needed. Some of the instance-based learning algorithms are : K Nearest Neighbor (KNN) Self-Organizing Map (SOM) Learning Vector Quantization (LVQ) Locally Weighted Learning (LWL) Case-Based Reasoning

KNN Algorithm: K-nearest neighbours (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression problems. It is mainly used for classification problems in industry. Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification. Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.

Makes predictions based on the similarity (typically distance) between the new data point(new instance ) and the stored instances.

Classification Using Knn

NAME AGE GENDER CLASS OF SPORTS Ajay 32 0 Football Mark 40 0 Neither Sara 16 1 Cricket Zaira 34 1 Cricket Sachin 55 0 Neither Rahul 40 0 Cricket Pooja 20 1 Neither Smith 15 0 Cricket Laxmi 55 1 Football Michael 15 0 Football Let’s find in which class of people Angelina will lie whose k factor is 3 and age is 5. So we have to find out the distance using d=√((x2-x1)²+(y2-y1)²) to find the distance between any two points.

distance between Ajay and Angelina using formula d=√((age2-age1)²+(gender2-gender1)²) d=√((5-32)²+(1-0)²) d=√729+1 d=27.02

Similarly, we find out all distance one by one. Distance between Angelina and Distance Ajay 27.02 Mark 35.01 Sara 11.00 Cricket Zaira 29.00 Sachin 50.01 Rahul 35.01 Pooja 15.00 Smith 10.05 Cricket Laxmi 50.00 Michael 10.05 Football Angelina-Cricket

Regression Using Knn

BRIGHTNESS SATURATION CLASS 40 20 Red 50 50 Blue 60 90 Blue 10 25 Red 70 70 Blue 60 10 Red 25 80 Blue BRIGHTNESS SATURATION CLASS K=5 20 35 ?

BRIGHTNESS SATURATION CLASS DISTANCE 40 20 Red 25 50 50 Blue 33.54 60 90 Blue 68.01 10 25 Red 10 70 70 Blue 61.03 60 10 Red 47.17 25 80 Blue 45

BRIGHTNESS SATURATION CLASS DISTANCE 10 25 Red 10 40 20 Red 25 50 50 Blue 33.54 25 80 Blue 45 60 10 Red 47.17 70 70 Blue 61.03 60 90 Blue 68.01

BRIGHTNESS SATURATION CLASS 40 20 Red 50 50 Blue 60 90 Blue 10 25 Red 70 70 Blue 60 10 Red 25 80 Blue 20 35 Red

How it Works: Training Phase: In k-NN, there is no explicit training phase. The algorithm simply stores the training data. .

Prediction Phase: When a new instance is introduced for prediction, the algorithm follows these steps: Compute Distances: Calculate the distance between the new instance and all the instances in the training set. Common distance metrics include Euclidean distance for continuous variables, Manhattan distance, or Hamming distance for categorical variables. Identify Neighbors : Select the 'k' instances from the training set that are closest to the new instance (the 'k' nearest neighbors ). Aggregate the Output: For classification: Perform a majority vote among the 'k' nearest neighbors . The class that appears most frequently among the neighbours is assigned to the new instance. For regression: Calculate the average of the values of the 'k' nearest neighbors and assign this average to the new instance

Step 1: Dataset and New Point Dataset: x y 1 2 2 3 3 5 4 4 5 7 New Point: 𝑥new=3.5

distance between new instance and data samples in data set: D1= sqrt((3.5-1) **2 )=2.5 D2=1.5 D3=0.5 D4=0.5 D5=1.5 Select 3 nearest neighbours (x3,y3)=(3,5) (x4,y4)=(4,4) ((x2,y2)=(2,3

Compute Weights Weights are the inverse of the distances. To avoid division by zero, we add a small value (0.000010 to the distances. W3=1/(0.5+0.00001)=1.99996 W4=1/(0.5+0.00001)=1.99996 W2=1/(1.5+0.00001)=0.66666 Compute Weighted Average Compute the weighted sum of the target values and the sum of weights: Weighted sum of Y=(5 *1.99996) + (4 * 1.99996) + (3 * 0.66666)=19.99962 Sum of weights=!.99996 +1.99996 + 0.66666=4.66658 Weighted average=19.99962/4.66658=4.2857 (3.5, 4.2857)

Once we add distance weighting, there is really no harm in allowing all training examples to have an influence on the classification of the x,, because very distant examples will have very little effect on f(x,). Global method(Shepard's method) /otherwise local method. Considering all examples will make our our classifier to run more slowly.

CASE-BASED REASONING: k-NEAREST NEIGHBOR algorithm is lazy and classify new query instances by analysing similar instances while ignoring instances that are very different from the query. Represent instances as real-valued points in an n-dimensional Euclidean space. Case-based reasoning (CBR) is a learning paradigm based on the first two of these principles In CBR, instances are typically represented using more rich symbolic descriptions, and the methods used to retrieve similar instances are correspondingly more elaborate CBR has been applied to problems such as conceptual design of mechanical devices based on a stored library of previous designs. Reasoning about new legal cases based on previous rulings

Solving planning and scheduling problems by reusing and combining portions of previous solutions to similar problems. The CADET system : CADET is a Case-based Design Tool. CADET is a system that aids conceptual design of electro-mechanical devices and is based on the paradigm of Case-based Reasoning. A library containing approximately 75 previous designs and design fragments to suggest conceptual designs to meet the specifications of new design problems. Each instance stored in memory (e.g., a water pipe) is represented by describing both its structure and its qualitative function.

Given this functional specification for the new design problem, CADET searches its library for stored cases whose functional descriptions match the design problem. If an exact match is found, indicating that some stored case implements exactly the desired function, then this case can be returned as a suggested solution to the design problem. If no exact match occurs, CADET may find cases that match various subgraphs of the desired functional specification. T-junction function matches a subgraph of the water faucet function graph.

By retrieving multiple cases that match different subgraphs, the entire design can sometimes be pieced together. It may also require backtracking on earlier choices of design subgoals and, therefore, rejecting cases that were previously retrieved.
Tags