Machine Learning ebook.pdf

MachineLearning
BasicConceptsFeature'2
'
Feature'1
' !"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'

Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.

Dataeverywhere!
1.Google:processes 24 peta bytes of data per day.
2.Facebook:10 million photos uploaded every hour.
3.Youtube:1 hour of video uploaded every second.
4.Twitter:400 million tweets per day.
5.Astronomy: Satellite data is in hundreds of PB.
6.:::
7.\By 2020 the digital universe will reach 44
zettabytes..."
The Digital Universe of Opportunities: Rich Data and the
Increasing Value of the Internet of Things, April 2014.
That's 44 trillion gigabytes!

Datatypes
Data comes in dierent sizes and also avors (types):
Texts
Numbers
Clickstreams
Graphs
Tables
Images
Transactions
Videos
Some or all of the above!

Smile,weare'DATAFIED'!
Wherever we go, we are \dataed".
Smartphones are tracking our locations.
We leave a data trail in our web browsing.
Interaction in social networks.
Privacy is an important issue in Data Science.

TheDataScienceprocessTime
DATA COLLECTION
Static
Data.
Domain
expertise
1 3
45
!
DB%
DB
EDA
MACHINE LEARNINGVisualization
Descriptive
statistics,
Clustering
Research
questions?
Classiﬁcation,
scoring, predictive
models,
clustering, density
estimation, etc.
Data-driven
decisions
Application
deployment
Model%(f)%
Yes!/!
90%!
Predicted%class/risk%
A!and!B!!!C!
Dashboard
Static
Data.
2DATA PREPARATION
Data!cleaning!
+
+
+ +
+ -
+
+
-
-
-
-
-
-
+
Feature/variable!
engineering!

ApplicationsofML
We all use it on a daily basis. Examples:

MachineLearning
Spam ltering
Credit card fraud detection
Digit recognition on checks, zip codes
Detecting faces in images
MRI image analysis
Recommendation system
Search engines
Handwriting recognition
Scene classication
etc...

InterdisciplinaryeldML!
Statistics!
Visualization!
Economics!
Databases!
Signal
processing!
Engineering !
Biology!

MLversusStatistics
Statistics:
Hypothesis testing
Experimental design
Anova
Linear regression
Logistic regression
GLM
PCA
Machine Learning:
Decision trees
Rule induction
Neural Networks
SVMs
Clustering method
Association rules
Feature selection
Visualization
Graphical models
Genetic algorithm
http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf

MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
\A computer program is said to learnfrom experienceEwith
respect to some class of tasksTand performance measure P, if
its performance at tasks inT, as measured byP, improves with
experienceE. "
Tom Mitchell. Machine Learning 1997.

Supervisedvs. Unsupervised
Given:Training data: (x1;y1);:::;(xn;yn)= xi2R
d
andyiis the
label.
examplex1! x11x12::: x
1dy1 label
::: ::: ::: ::: ::: :::
examplexi! xi1xi2::: x
idyi label
::: ::: ::: ::: ::: :::
examplexn! xn1xn2::: x
ndyn label

Supervisedvs. Unsupervised
Unsupervised learning:
Learning a model fromunlabeleddata.
Supervised learning:
Learning a model fromlabeleddata.

UnsupervisedLearning
Training data:\examples"x.
x1;:::;xn; xi2XR
n
Clustering/segmentation:
f:R
d
! fC1;:::C
kg(set of clusters).
Example: Find clusters in the population, fruits, species.

UnsupervisedlearningFeature'2
'
Feature'1
'

UnsupervisedlearningFeature'2
'
Feature'1
'
Methods: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.

Supervisedlearning
Training data:\examples"xwith \labels"y.
(x1;y1);:::;(xn;yn)= xi2R
d
Classication:yis discrete. To simplify,y2 f1;+1g
f:R
d
! f1;+1g fis called abinary classier.
Example: Approve credit yes/no, spam/ham, banana/orange.

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'

Supervisedlearning!"#$%&"'(
'
!"#$%&"')
'
*"+,-,./'0.%/1#&2'
Methods: Support Vector Machines, neural networks, decision
trees, K-nearest neighbors, naive Bayes, etc.

Supervisedlearning
Classication:!"#$%&"'(
'
!"#$%&"')
' !"#$%&"'(
'
!"#$%&"')
' !"#$%&"'('
!"#$%&"')
' !"#$%&"'('
!"#$%&"')
' !"#$%&"'(
'
!"#$%&"')
'

Supervisedlearning
Non linear classication

Supervisedlearning
Training data:\examples"xwith \labels"y.
(x1;y1);:::;(xn;yn)= xi2R
d
Regression:yis a real value,y2R
f:R
d
!R fis called aregressor.
Example: amount of credit, weight of fruit.

Supervisedlearning
Regression:!"
#$%&'($")
"
Example: Income in function of age, weight of the fruit in function
of its length.

Supervisedlearning
Regression:!"
#$%&'($")
"

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
6%7/1)8''
&)%2)"8''
#&)8''
4#1$.9'(*#*:(8'
;$<7/2)'
=")2$*'#1/:%*'>'
=")2$*'9)(?%/'

K-nearestneighbors
Not every ML method builds a model!
Our rst ML method: KNN.
Main idea: Uses thesimilaritybetween examples.
Assumption: Two similar examples should have same labels.
Assumes all examples (instances) are points in the ddimen-
sional spaceR
d
.

K-nearestneighbors
KNN uses the standardEuclidian distanceto dene nearest
neighbors.
Given two examplesxiandxj:
d(xi;xj) =
v
u
u
u
t
d
X
k=1
(x
ikx
jk)
2

K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.

K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.
Classication algorithm:
Given an examplexqto be classied. SupposeN
k(xq) is the set of
the K-nearest neighbors ofxq.
^yq=sign(
X
xi2N
k(xq)
yi)

K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.

K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.
Question: Draw an approximate decision boundary for K= 3?

K-nearestneighbors
Credit: Introduction to Statistical Learning.

K-nearestneighbors
Question: What are the pros and cons of K-NN?

K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+

K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+
Cons:
-
- nexamples anddfeatures. The method takes
O(nd) to run.
- curse of dimensionality.

ApplicationsofK-NN
1.
2.
large databases.
3.
4.
5.
6.

TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
6%7/1)8''
&)%2)"8''
#&)8''
4#1$.9'(*#*:(8'
;$<7/2)'
=")2$*'#1/:%*'>'
=")2$*'9)(?%/'
Question: How can we be condent about f?

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise
{
`oss(yi;f(xi)) = (yif(xi))
2

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)

TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)
We hope thatE
test
(f), the out-sample error (test/true error),
will be small too.

Overtting/undertting
An intuitive example

StructuralRiskMinimizationPredic'on*Error
*
Low*******************************************Complexity*of*the*model*************************************High*
____Test*error****
____Training*error*
High*Bias****** ******** **Low*Bias**
Low*Variance** ******* High*Variance
*
UnderﬁAng****************Good*models*** **OverﬁAng* **********
*

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
' !"#$%&'
()&
'

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting)!"#$%&'
()&
'

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting)!"#$%&'
()&
' High variance (overtting)

TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting) Just right!!"#$%&'
()&
' High variance (overtting)

Avoidovertting
In general, use simple models!
Reduce the number of features manually or do feature selec-
tion.
Do amodel selection(ML course).
Useregularization(keep the features but reduce their impor-
tance by setting small parameter values) (ML course).
Do across-validationto estimate the test error.

Regularization: Intuition
We want to minimize:
Classication term+CRegularization term
n
X
i=1
`oss(yi;f(xi))+CR(f)

Regularization: Intuition!"#$%&'
()&
' !"#$%&'
()&
' !"#$%&'
()&
'
f(x) =0+1x... (1)
f(x) =0+1x+2x
2
... (2)
f(x) =0+1x+2x
2
+3x
3
+4x
4
... (3)
Hint: Avoid high-degree polynomials.

Train,ValidationandTestTRAIN VALIDATION TEST
Example:Split the data randomly into 60% for training, 20% for
validation and 20% for testing.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.

Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.
Note: Never use the test set in any way to further tune
the parameters or revise the model .

K-foldCrossValidation
A method for estimating test error using training data.
Algorithm:
Given a learning algorithmAand a datasetD
Step 1:Randomly partitionDintokequal-size subsetsD1;:::;D
k
Step 2:
Forj= 1 tok
TrainAon allDi,i21;:::kandi6=j, and getfj.
ApplyfjtoDjand computeE
Dj
Step 3:Average error over all folds.
k
X
j=1
(E
Dj)

Confusionmatrix!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&-./0($%&'()*)+$%,.&-
(')*%$&' ./0($%1$2/*)+$%,.1-!"#$%1$2/*)+$%,!1-
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
&"$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00-,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/<$0%
&"$=)4*$=%;/<$0
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*%-*4'-
5"44'5%
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$&'
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'5%

Evaluationmetrics!"#$%$&' (')*%$&'
!"#$%$&' !"#$%&'()*)+$%,!&-./0($%&'()*)+$%,.&-
(')*%$&' ./0($%1$2/*)+$%,.1-!"#$%1$2/*)+$%,!1-
344#"/45 +,!-.-,(/-0-+,!-.-,(-.-1!-.-1(/
&"$4)()'6 ,!-0-+,!-.-1!/
7$6()*)+)*5%,8$4/00-,!-0-+,!-.-1(/
79$4):)4)*5 ,(-0-+,(-.-1!/
34*#/0%;/<$0%
&"$=)4*$=%;/<$0
,2'-3'45'6%*)'-"7-3"#$%$&'-34'8$5%$"6#-%2*%-*4'-
5"44'5%
,2'-3'45'6%*)'-"7-3"#$%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-3"#$%$&'
,2'-3'45'6%*)'-"7-6')*%$&'-5*#'#-%2*%-9'4'-
34'8$5%'8-*#-6')*%$&'
,2'-3'45'6%*)'-"7-34'8$5%$"6#-%2*%-*4'-5"44'5%

Terminologyreview
Review the concepts and terminology:
Instance, example, feature, label, supervised learning, unsu-
pervised learning, classication, regression, clustering, pre-
diction, training set, validation set, test set, K-fold cross val-
idation, classication error, loss function, overtting, under-
tting, regularization.

MachineLearningBooks
1.
2.
Hsuan-Tien, Learning From Data, AMLBook.
3.
and prediction T. Hastie, R. Tibshirani, J. Friedman.
4.
ing.
5.
Classication. Wiley.

MachineLearningResources
Major journals/conferences: ICML, NIPS, UAI, ECML/PKDD,
JMLR, MLJ, etc.
Machine learning video lectures:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
Machine Learning (Theory):
http://hunch.net/
LinkedIn ML groups: \Big Data" Scientist, etc.
Women in Machine Learning:
https://groups.google.com/forum/#!forum/women-in-machine-learning
KDD nuggets http://www.kdnuggets.com/

Credit
The elements of statistical learning. Data mining, inference,
and prediction. 10th Edition 2009. T. Hastie, R. Tibshirani,
J. Friedman.
Machine Learning 1997. Tom Mitchell.

Machine Learning ebook.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Machine Learning ebook.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx