Machine Learning ebook.pdf

149 views 70 slides May 27, 2023
Slide 1
Slide 1 of 70
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70

About This Presentation

Machine learning


Slide Content

MachineLearning
BasicConceptsFeature'2
'
Feature'1
' !"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'

Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.

Dataeverywhere!
1.Google:processes 24 peta bytes of data per day.
2.Facebook:10 million photos uploaded every hour.
3.Youtube:1 hour of video uploaded every second.
4.Twitter:400 million tweets per day.
5.Astronomy: Satellite data is in hundreds of PB.
6.:::
7.\By 2020 the digital universe will reach 44
zettabytes..."
The Digital Universe of Opportunities: Rich Data and the
Increasing Value of the Internet of Things, April 2014.
That's 44 trillion gigabytes!

Datatypes
Data comes in dierent sizes and also avors (types):
Texts
Numbers
Clickstreams
Graphs
Tables
Images
Transactions
Videos
Some or all of the above!

Smile,weare'DATAFIED'!
Wherever we go, we are \dataed".
Smartphones are tracking our locations.
We leave a data trail in our web browsing.
Interaction in social networks.
Privacy is an important issue in Data Science.

TheDataScienceprocessTime
DATA COLLECTION
Static
Data.
Domain
expertise
1 3
45
!
DB%
DB
EDA
MACHINE LEARNINGVisualization
Descriptive
statistics,
Clustering
Research
questions?
Classification,
scoring, predictive
models,
clustering, density
estimation, etc.
Data-driven
decisions
Application
deployment
Model%(f)%
Yes!/!
90%!
Predicted%class/risk%
A!and!B!!!C!
Dashboard
Static
Data.
2DATA PREPARATION
Data!cleaning!
+
+
+ +
+ -
+
+
-
-
-
-
-
-
+
Feature/variable!
engineering!

ApplicationsofML
We all use it on a daily basis. Examples:

MachineLearning
Spam ltering
Credit card fraud detection
Digit recognition on checks, zip codes
Detecting faces in images
MRI image analysis
Recommendation system
Search engines
Handwriting recognition
Scene classication
etc...

InterdisciplinaryeldML!
Statistics!
Visualization!
Economics!
Databases!
Signal
processing!
Engineering !
Biology!

MLversusStatistics
Statistics:
Hypothesis testing
Experimental design
Anova
Linear regression
Logistic regression
GLM
PCA
Machine Learning:
Decision trees
Rule induction
Neural Networks
SVMs
Clustering method
Association rules
Feature selection
Visualization
Graphical models
Genetic algorithm
http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf

MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
\A computer program is said to learnfrom experienceEwith
respect to some class of tasksTand performance measure P, if
its performance at tasks inT, as measured byP, improves with
experienceE. "
Tom Mitchell. Machine Learning 1997.

Supervisedvs. Unsupervised
Given:Training data: (x1;y1);:::;(xn;yn)= xi2R
d
andyiis the
label.
examplex1! x11x12::: x
1dy1 label
::: ::: ::: ::: ::: :::
examplexi! xi1xi2::: x
idyi label
::: ::: ::: ::: ::: :::
examplexn! xn1xn2::: x
ndyn label

Supervisedvs. Unsupervised
Given:Training data: (x1;y1);:::;(xn;yn)= xi2R
d
andyiis the
label.
examplex1! x11x12::: x
1dy1 label
::: ::: ::: ::: ::: :::
examplexi! xi1xi2::: x
idyi label
::: ::: ::: ::: ::: :::
examplexn! xn1xn2::: x
ndyn label

Supervisedvs. Unsupervised
Unsupervised learning:
Learning a model fromunlabeleddata.
Supervised learning:
Learning a model fromlabeleddata.

UnsupervisedLearning
Training data:\examples"x.
x1;:::;xn; xi2XR
n
Clustering/segmentation:
f:R
d
! fC1;:::C
kg(set of clusters).
Example: Find clusters in the population, fruits, species.

UnsupervisedlearningFeature'2
'
Feature'1
'

UnsupervisedlearningFeature'2
'
Feature'1
'

UnsupervisedlearningFeature'2
'
Feature'1
'
Methods: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.

Supervisedlearning
Training data:\examples"xwith \labels"y.
(x1;y1);:::;(xn;yn)= xi2R
d
Classication:yis discrete. To simplify,y2 f1;+1g
f:R
d
! f1;+1g fis called abinary classier.
Example: Approve credit yes/no, spam/ham, banana/orange.

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'
Methods: Support Vector Machines, neural networks, decision
trees, K-nearest neighbors, naive Bayes, etc.

Supervisedlearning
Classication:!"#$%&"'(
'
!"#$%&"')
' !"#$%&"'(
'
!"#$%&"')
' !"#$%&"'('
!"#$%&"')
' !"#$%&"'('
!"#$%&"')
' !"#$%&"'(
'
!"#$%&"')
'

Supervisedlearning
Non linear classication

Supervisedlearning
Training data:\examples"xwith \labels"y.
(x1;y1);:::;(xn;yn)= xi2R
d
Regression:yis a real value,y2R
f:R
d
!R fis called aregressor.
Example: amount of credit, weight of fruit.

Supervisedlearning
Regression:!"
#$%&'($")
"
Example: Income in function of age, weight of the fruit in function
of its length.

Supervisedlearning
Regression:!"
#$%&'($")
"

Supervisedlearning
Regression:!"
#$%&'($")
"

Supervisedlearning
Regression:!"
#$%&'($")
"

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
6%7/1)8''
&)%2)"8''
#&)8''
4#1$.9'(*#*:(8'
;$<7/2)'
=")2$*'#1/:%*'>'
=")2$*'9)(?%/'

K-nearestneighbors
Not every ML method builds a model!
Our rst ML method: KNN.
Main idea: Uses thesimilaritybetween examples.
Assumption: Two similar examples should have same labels.
Assumes all examples (instances) are points in the ddimen-
sional spaceR
d
.

K-nearestneighbors
KNN uses the standardEuclidian distanceto dene nearest
neighbors.
Given two examplesxiandxj:
d(xi;xj) =
v
u
u
u
t
d
X
k=1
(x
ikx
jk)
2

K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.

K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.
Classication algorithm:
Given an examplexqto be classied. SupposeN
k(xq) is the set of
the K-nearest neighbors ofxq.
^yq=sign(
X
xi2N
k(xq)
yi)

K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.

K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.
Question: Draw an approximate decision boundary for K= 3?

K-nearestneighbors
Credit: Introduction to Statistical Learning.

K-nearestneighbors
Question: What are the pros and cons of K-NN?

K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+

K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+
Cons:
-
- nexamples anddfeatures. The method takes
O(nd) to run.
- curse of dimensionality.

ApplicationsofK-NN
1.
2.
large databases.
3.
4.
5.
6.

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
6%7/1)8''
&)%2)"8''
#&)8''
4#1$.9'(*#*:(8'
;$<7/2)'
=")2$*'#1/:%*'>'
=")2$*'9)(?%/'
Question: How can we be condent about f?

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise
{
`oss(yi;f(xi)) = (yif(xi))
2

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)
We hope thatE
test
(f), the out-sample error (test/true error),
will be small too.

Overtting/undertting
An intuitive example

StructuralRiskMinimizationPredic'on*Error
*
Low*******************************************Complexity*of*the*model*************************************High*
____Test*error****
____Training*error*
High*Bias****** ******** **Low*Bias**
Low*Variance** ******* High*Variance
*
UnderfiAng****************Good*models*** **OverfiAng* **********
*

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
' !"#$%&'
()&
'

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting)!"#$%&'
()&
'

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting)!"#$%&'
()&
' High variance (overtting)

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting) Just right!!"#$%&'
()&
' High variance (overtting)

Avoidovertting
In general, use simple models!
Reduce the number of features manually or do feature selec-
tion.
Do amodel selection(ML course).
Useregularization(keep the features but reduce their impor-
tance by setting small parameter values) (ML course).
Do across-validationto estimate the test error.

Regularization: Intuition
We want to minimize:
Classication term+CRegularization term
n
X
i=1
`oss(yi;f(xi))+CR(f)

Regularization: Intuition!"#$%&'
()&
' !"#$%&'
()&
' !"#$%&'
()&
'
f(x) =0+1x... (1)
f(x) =0+1x+2x
2
... (2)
f(x) =0+1x+2x
2
+3x
3
+4x
4
... (3)
Hint: Avoid high-degree polynomials.

Train,ValidationandTestTRAIN VALIDATION TEST
Example:Split the data randomly into 60% for training, 20% for
validation and 20% for testing.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.
Note: Never use the test set in any way to further tune
the parameters or revise the model .

K-foldCrossValidation
A method for estimating test error using training data.
Algorithm:
Given a learning algorithmAand a datasetD
Step 1:Randomly partitionDintokequal-size subsetsD1;:::;D
k
Step 2:
Forj= 1 tok
TrainAon allDi,i21;:::kandi6=j, and getfj.
ApplyfjtoDjand computeE
Dj
Step 3:Average error over all folds.
k
X
j=1
(E
Dj)

Confusionmatrix!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&-./0($%&'()*)+$%,.&-
(')*%$&' ./0($%1$2/*)+$%,.1-!"#$%1$2/*)+$%,!1-
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
&"$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00-,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/<$0%
&"$=)4*$=%;/<$0
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*%-*4'-
5"44'5%
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$&'
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'5%

Evaluationmetrics!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&-./0($%&'()*)+$%,.&-
(')*%$&' ./0($%1$2/*)+$%,.1-!"#$%1$2/*)+$%,!1-
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
&"$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00-,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/<$0%
&"$=)4*$=%;/<$0
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*%-*4'-
5"44'5%
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$&'
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'5%

Terminologyreview
Review the concepts and terminology:
Instance, example, feature, label, supervised learning, unsu-
pervised learning, classication, regression, clustering, pre-
diction, training set, validation set, test set, K-fold cross val-
idation, classication error, loss function, overtting, under-
tting, regularization.

MachineLearningBooks
1.
2.
Hsuan-Tien, Learning From Data, AMLBook.
3.
and prediction T. Hastie, R. Tibshirani, J. Friedman.
4.
ing.
5.
Classication. Wiley.

MachineLearningResources
Major journals/conferences: ICML, NIPS, UAI, ECML/PKDD,
JMLR, MLJ, etc.
Machine learning video lectures:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
Machine Learning (Theory):
http://hunch.net/
LinkedIn ML groups: \Big Data" Scientist, etc.
Women in Machine Learning:
https://groups.google.com/forum/#!forum/women-in-machine-learning
KDD nuggets http://www.kdnuggets.com/

Credit
The elements of statistical learning. Data mining, inference,
and prediction. 10th Edition 2009. T. Hastie, R. Tibshirani,
J. Friedman.
Machine Learning 1997. Tom Mitchell.