Ml1 introduction to-supervised_learning_and_k_nearest_neighbors

Introduction to Supervised Learning

Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com . This sample source code is released under the Intel Sample Source Code License Agreement . Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2017, Intel Corporation. All rights reserved.

What is Machine Learning? Machine learning allows computers to learn and infer from data.

Spam Filtering Web Search Postal Mail Routing Fraud Detection Movie Recommendations Vehicle Driver Assistance Web Advertisements Social Networks Speech Recognition Machine Learning in Our Daily Lives

Unsupervised Types of Machine Learning data points have known outcome Supervised data points have unknown outcome

Types of Machine Learning data points have known outcome data points have unknown outcome Unsupervised Supervised

Types of Machine Learning data points have known outcome Unsupervised data points have unknown outcome Supervised

Regression Classification Types of Supervised Learning outcome is continuous (numerical) outcome is a category

Supervised Learning Overview d ata w ith ans w er s model p re di cte d ans w er s d ata w ith o u t ans w er s fit + + predict model model

Regression: Numeric Answers movie data with revenue model p re di cte d revenue movie data (unknown revenue) fit + + predict model model

Classification: Categorical Answers labeled data model labels unlabeled data fit + + predict model model

Classification: Categorical Answers emails labeled as spam/not spam model spam or not spam unlabeled emails fit + + model model predict

Target: predicted category or value of the data (column to predict) Features : properties of the data used for prediction (non-target columns) Example: a single data point within the data (one row) Label: the target value for a single data point Machine Learning Vocabulary

Machine Learning Vocabulary sepal length sepal width petal length petal width species 6.7 3.0 5.2 2.3 virginica 6.4 2.8 5.6 2.1 virginica 4.6 3.4 1.4 0.3 setosa 6.9 3.1 4.9 1.5 versicolor 4.4 2.9 1.4 0.2 setosa 4.8 3.0 1.4 0.1 setosa 5.9 3.0 5.1 1.8 virginica 5.4 3.9 1.3 0.4 setosa 4.9 3.0 1.4 0.2 setosa 5.4 3.4 1.7 0.2 setosa

Machine Learning Vocabulary sepal length sepal width petal length petal width species 6.7 3.0 5.2 2.3 virginica 6.4 2.8 5.6 2.1 virginica 4.6 3.4 1.4 0.3 setosa 6.9 3.1 4.9 1.5 versicolor 4.4 2.9 1.4 0.2 setosa 4.8 3.0 1.4 0.1 setosa 5.9 3.0 5.1 1.8 virginica 5.4 3.9 1.3 0.4 setosa 4.9 3.0 1.4 0.2 setosa 5.4 3.4 1.7 0.2 setosa Target

Target: predicted category or value of the data (column to predict) Features : properties of the data used for prediction (non-target columns) Example: a single data point within the data (one row) Label: the target value for a single data point Machine Learning Vocabulary

Machine Learning Vocabulary sepal length sepal width petal length petal width species 6.7 3.0 5.2 2.3 virginica 6.4 2.8 5.6 2.1 virginica 4.6 3.4 1.4 0.3 setosa 6.9 3.1 4.9 1.5 versicolor 4.4 2.9 1.4 0.2 setosa 4.8 3.0 1.4 0.1 setosa 5.9 3.0 5.1 1.8 virginica 5.4 3.9 1.3 0.4 setosa 4.9 3.0 1.4 0.2 setosa 5.4 3.4 1.7 0.2 setosa Features

Target: predicted category or value of the data (column to predict) Features : properties of the data used for prediction (non-target columns) Example: a single data point within the data (one row) Label: the target value for a single data point Machine Learning Vocabulary

Machine Learning Vocabulary sepal length sepal width petal length petal width species 6.7 3.0 5.2 2.3 virginica 6.4 2.8 5.6 2.1 virginica 4.6 3.4 1.4 0.3 setosa 6.9 3.1 4.9 1.5 versicolor 4.4 2.9 1.4 0.2 setosa 4.8 3.0 1.4 0.1 setosa 5.9 3.0 5.1 1.8 virginica 5.4 3.9 1.3 0.4 setosa 4.9 3.0 1.4 0.2 setosa 5.4 3.4 1.7 0.2 setosa Example

Machine Learning Vocabulary Target: predicted category or value of the data (column to predict) Features : properties of the data used for prediction (non-target columns) Example: a single data point within the data (one row) Label: the target value for a single data point

Machine Learning Vocabulary sepal length sepal width petal length petal width species 6.7 3.0 5.2 2.3 virginica 6.4 2.8 5.6 2.1 virginica 4.6 3.4 1.4 0.3 setosa 6.9 3.1 4.9 1.5 versicolor 4.4 2.9 1.4 0.2 setosa 4.8 3.0 1.4 0.1 setosa 5.9 3.0 5.1 1.8 virginica 5.4 3.9 1.3 0.4 setosa 4.9 3.0 1.4 0.2 setosa 5.4 3.4 1.7 0.2 setosa Label

26

K – Nearest Neighbors

What is Classification? A flower shop wants to guess a customer's purchase from similarity to most recent purchase.

What is Classification? ? Which flower is a customer most likely to purchase based on similarity to previous purchase?

What is Needed for Classification? Model data with : Features that can be quantitated Labels that are known Method to measure similarity

K Nearest Neighbors Classification

Number of Malignant Nodes Age 60 K Nearest Neighbors Classification 40 20 10 20 Survived Did not survive

Number of Malignant Nodes Age 60 Predict K Nearest Neighbors Classification 40 20 10 20

K Nearest Neighbors Classification Neighbor Count (K = 1): 1 Number of Malignant Nodes Age 60 Predict 40 20 10 20

K Nearest Neighbors Classification Neighbor Count (K = 2): 1 1 Number of Malignant Nodes Age 60 Predict 40 20 10 20

K Nearest Neighbors Classification Neighbor Count (K = 3): 2 1 Number of Malignant Nodes Age 60 Predict 40 20 10 20

K Nearest Neighbors Classification Neighbor Count (K = 4): 3 1 Number of Malignant Nodes Age 60 Predict 40 20 10 20

What is Needed to Select a KNN Model?

Correct value for ' K' How to measure closeness of neighbors? What is Needed to Select a KNN Model? Number of Malignant Nodes Age 60 40 20 10 20

K = 1 Number of Malignant Nodes Age 60 40 20 10 20 K Nearest Neighbors Decision Boundary

K Nearest Neighbors Decision Boundary K = All Number of Malignant Nodes Age 60 40 20 10 20

Value of 'K' Affects Decision Boundary Number of Malignant Nodes Age 60 40 20 10 20 Number of Malignant Nodes 60 40 20 10 20 K = 1 K = All

Value of 'K' Affects Decision Boundary Number of Malignant Nodes Age 60 40 20 10 20 Number of Malignant Nodes 60 40 20 10 20 K = 1 K = All Methods for determining 'K ' will be discussed in next lesson

Number of Malignant Nodes Age 60 40 20 10 20 Measurement of Distance in KNN

Number of Malignant Nodes Age Euclidean Distance

Number of Malignant Nodes Age Euclidean Distance (L2 Distance) ∆ Age d ∆ Nodes

Number of Malignant Nodes Age ∆ Age ∆ Nodes Manhattan Distance (L1 or City Block Distance)

Number of Surgeries Scale is Important for Distance Measurement Age 60 40 20 1 2 3 4 5

Number of Surgeries Age 60 40 20 1 2 3 4 5 24 22 20 18 Scale is Important for Distance Measurement

Number of Surgeries Age 60 40 20 1 2 3 4 5 24 22 20 18 Nearest Neighbors! Scale is Important for Distance Measurement

Number of Surgeries Age 60 40 20 1 2 4 5 3 "Feature Scaling " Scale is Important for Distance Measurement

Number of Surgeries Age 60 40 20 1 2 4 5 3 Scale is Important for Distance Measurement "Feature Scaling "

Number of Surgeries Age 60 40 20 1 2 4 5 3 Scale is Important for Distance Measurement Nearest Neighbors! "Feature Scaling "

Comparison of Feature Scaling Methods Standard Scaler: mean center data and scale to unit variance Minimum-Maximum Scaler: scale data to fixed range (usually 0–1) Maximum Absolute Value Scaler: scale maximum absolute value

Feature Scaling: The Syntax Import the class containing the scaling method from sklearn.preprocessing import StandardScaler Create an instance of the class StdSc = StandardScaler () Fit the scaling parameters and then transform the data StdSc = StdSc . fit ( X_data ) X_scaled = KNN . transform ( X_data ) Other scaling methods exist: MaxAbsScaler , MinMaxScaler .

Feature Scaling: The Syntax Import the class containing the scaling method from sklearn.preprocessing import StandardScaler Create an instance of the class StdSc = StandardScaler () Fit the scaling parameters and then transform the data StdSc = StdSc . fit ( X_data ) X_scaled = StdSc . transform (X_data) Other scaling methods exist: MaxAbsScaler , MinMaxScaler .

Feature Scaling: The Syntax Import the class containing the scaling method from sklearn.preprocessing import StandardScaler Create an instance of the class StdSc = StandardScaler () Fit the scaling parameters and then transform the data StdSc = StdSc . fit ( X_data ) X_scaled = StdSc . transform ( X_data ) Other scaling methods exist: MinMaxScaler , MaxAbsScaler .

Multiclass KNN Decision Boundary K = 5 Number of Malignant Nodes Age 60 40 20 10 20 Full remission Did not survive Partial remission

Regression with KNN K = 1 K = 3 K = 20

Characteristics of a KNN Model Fast to create model because it simply stores data Slow to predict because many distance calculations Can require lots of memory if data set is large

Import the class containing the classification method from sklearn.neighbors import KNeighborsClassifier Create an instance of the class KNN = KNeighborsClassifier ( n_neighbors =3) Fit the instance on the data and then predict the expected value KNN = KNN . fit ( X_data , y_data ) y_predict = KNN . predict ( X_data ) The fit and predict / transform syntax will show up throughout the course. K Nearest Neighbors: The Syntax

K Nearest Neighbors: The Syntax Import the class containing the classification method from sklearn.neighbors import KNeighborsClassifier Create an instance of the class KNN = KNeighborsClassifier ( n_neighbors =3) Fit the instance on the data and then predict the expected value KNN = KNN . fit ( X_data , y_data ) y_predict = KNN . predict ( X_data ) The fit and predict / transform syntax will show up throughout the course.

K Nearest Neighbors: The Syntax Import the class containing the classification method from sklearn.neighbors import KNeighborsClassifier Create an instance of the class KNN = KNeighborsClassifier ( n_neighbors =3) Fit the instance on the data and then predict the expected value KNN = KNN . fit ( X_data , y_data ) y_predict = KNN . predict ( X_data ) Regression can be done with KNeighborsRegressor .

Ml1 introduction to-supervised_learning_and_k_nearest_neighbors

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Ml1 introduction to-supervised_learning_and_k_nearest_neighbors

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Tags

Categories

Download

Quick Actions

Statistics