Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.
Dataeverywhere!
1.Google:processes 24 peta bytes of data per day.
2.Facebook:10 million photos uploaded every hour.
3.Youtube:1 hour of video uploaded every second.
4.Twitter:400 million tweets per day.
5.Astronomy: Satellite data is in hundreds of PB.
6.:::
7.\By 2020 the digital universe will reach 44
zettabytes..."
The Digital Universe of Opportunities: Rich Data and the
Increasing Value of the Internet of Things, April 2014.
That's 44 trillion gigabytes!
Datatypes
Data comes in dierent sizes and also avors (types):
Texts
Numbers
Clickstreams
Graphs
Tables
Images
Transactions
Videos
Some or all of the above!
Smile,weare'DATAFIED'!
Wherever we go, we are \dataed".
Smartphones are tracking our locations.
We leave a data trail in our web browsing.
Interaction in social networks.
Privacy is an important issue in Data Science.
MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
MachineLearningdenition
\How do we create computer programs that improve with experi-
ence?"
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
\A computer program is said to learnfrom experienceEwith
respect to some class of tasksTand performance measure P, if
its performance at tasks inT, as measured byP, improves with
experienceE. "
Tom Mitchell. Machine Learning 1997.
Supervisedvs. Unsupervised
Given:Training data: (x1;y1);:::;(xn;yn)= xi2R
d
andyiis the
label.
examplex1! x11x12::: x
1dy1 label
::: ::: ::: ::: ::: :::
examplexi! xi1xi2::: x
idyi label
::: ::: ::: ::: ::: :::
examplexn! xn1xn2::: x
ndyn label
Supervisedvs. Unsupervised
Given:Training data: (x1;y1);:::;(xn;yn)= xi2R
d
andyiis the
label.
examplex1! x11x12::: x
1dy1 label
::: ::: ::: ::: ::: :::
examplexi! xi1xi2::: x
idyi label
::: ::: ::: ::: ::: :::
examplexn! xn1xn2::: x
ndyn label
Supervisedvs. Unsupervised
Unsupervised learning:
Learning a model fromunlabeleddata.
Supervised learning:
Learning a model fromlabeleddata.
UnsupervisedLearning
Training data:\examples"x.
x1;:::;xn; xi2XR
n
Clustering/segmentation:
f:R
d
! fC1;:::C
kg(set of clusters).
Example: Find clusters in the population, fruits, species.
Supervisedlearning
Training data:\examples"xwith \labels"y.
(x1;y1);:::;(xn;yn)= xi2R
d
Regression:yis a real value,y2R
f:R
d
!R fis called aregressor.
Example: amount of credit, weight of fruit.
Supervisedlearning
Regression:!"
#$%&'($")
"
Example: Income in function of age, weight of the fruit in function
of its length.
K-nearestneighbors
Not every ML method builds a model!
Our rst ML method: KNN.
Main idea: Uses thesimilaritybetween examples.
Assumption: Two similar examples should have same labels.
Assumes all examples (instances) are points in the ddimen-
sional spaceR
d
.
K-nearestneighbors
KNN uses the standardEuclidian distanceto dene nearest
neighbors.
Given two examplesxiandxj:
d(xi;xj) =
v
u
u
u
t
d
X
k=1
(x
ikx
jk)
2
K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.
K-nearestneighbors
Training algorithm:
Add each training example (x;y) to the datasetD.
x2R
d
,y2 f+1;1g.
Classication algorithm:
Given an examplexqto be classied. SupposeN
k(xq) is the set of
the K-nearest neighbors ofxq.
^yq=sign(
X
xi2N
k(xq)
yi)
K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.
K-nearestneighbors
3-NN. Credit: Introduction to Statistical Learning.
Question: Draw an approximate decision boundary for K= 3?
K-nearestneighbors
Credit: Introduction to Statistical Learning.
K-nearestneighbors
Question: What are the pros and cons of K-NN?
K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+
K-nearestneighbors
Question: What are the pros and cons of K-NN?
Pros:
+
+
+
parameters.
+
Cons:
-
- nexamples anddfeatures. The method takes
O(nd) to run.
- curse of dimensionality.
ApplicationsofK-NN
1.
2.
large databases.
3.
4.
5.
6.
TrainingandTesting!"#$%$%&'()*'
+,'-.&/"$*01'
+/2).'345'
6%7/1)8''
&)%2)"8''
#&)8''
4#1$.9'(*#*:(8'
;$<7/2)'
=")2$*'#1/:%*'>'
=")2$*'9)(?%/'
Question: How can we be condent about f?
TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise
TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
Examples of loss functions:
{
`oss(yi;f(xi)) =
(
1 ifsign(yi)6=sign(f(xi))
0 otherwise
{
`oss(yi;f(xi)) = (yif(xi))
2
TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)
TrainingandTesting
We calculateE
train
the in-sample error (training error or em-
pirical error/risk).
E
train
(f) =
n
X
i=1
`oss(yi;f(xi))
We aim to haveE
train
(f) small, i.e., minimizeE
train
(f)
We hope thatE
test
(f), the out-sample error (test/true error),
will be small too.
TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting)!"#$%&'
()&
' High variance (overtting)
TrainingandTesting!"#$%&'
()&
' !"#$%&'
()&
'
High bias (undertting) Just right!!"#$%&'
()&
' High variance (overtting)
Avoidovertting
In general, use simple models!
Reduce the number of features manually or do feature selec-
tion.
Do amodel selection(ML course).
Useregularization(keep the features but reduce their impor-
tance by setting small parameter values) (ML course).
Do across-validationto estimate the test error.
Regularization: Intuition
We want to minimize:
Classication term+CRegularization term
n
X
i=1
`oss(yi;f(xi))+CR(f)
Train,ValidationandTestTRAIN VALIDATION TEST
Example:Split the data randomly into 60% for training, 20% for
validation and 20% for testing.
Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.
Train,ValidationandTestTRAIN VALIDATION TEST
1.
(e.g., a classication model).
2.
learning the model but can help tune model parameters (e.g.,
selecting K in K-NN). Validation helps control overtting.
3.
and provide an estimation of the test error.
Note: Never use the test set in any way to further tune
the parameters or revise the model .
K-foldCrossValidation
A method for estimating test error using training data.
Algorithm:
Given a learning algorithmAand a datasetD
Step 1:Randomly partitionDintokequal-size subsetsD1;:::;D
k
Step 2:
Forj= 1 tok
TrainAon allDi,i21;:::kandi6=j, and getfj.
ApplyfjtoDjand computeE
Dj
Step 3:Average error over all folds.
k
X
j=1
(E
Dj)
Terminologyreview
Review the concepts and terminology:
Instance, example, feature, label, supervised learning, unsu-
pervised learning, classication, regression, clustering, pre-
diction, training set, validation set, test set, K-fold cross val-
idation, classication error, loss function, overtting, under-
tting, regularization.
MachineLearningBooks
1.
2.
Hsuan-Tien, Learning From Data, AMLBook.
3.
and prediction T. Hastie, R. Tibshirani, J. Friedman.
4.
ing.
5.
Classication. Wiley.
MachineLearningResources
Major journals/conferences: ICML, NIPS, UAI, ECML/PKDD,
JMLR, MLJ, etc.
Machine learning video lectures:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
Machine Learning (Theory):
http://hunch.net/
LinkedIn ML groups: \Big Data" Scientist, etc.
Women in Machine Learning:
https://groups.google.com/forum/#!forum/women-in-machine-learning
KDD nuggets http://www.kdnuggets.com/
Credit
The elements of statistical learning. Data mining, inference,
and prediction. 10th Edition 2009. T. Hastie, R. Tibshirani,
J. Friedman.
Machine Learning 1997. Tom Mitchell.