It gives information about relation training and test error.
Size: 454.19 KB
Language: en
Added: Apr 14, 2021
Slides: 10 pages
Slide Content
VC Dimension in Machine Learning
Dr. Varun Kumar
Dr. Varun Kumar Lecture 18 1 / 10
Outlines
1
General Classication Problem
2
Usage of VC dimension in ML
3
Introduction to Vapnik-Chervonenkis (VC) Dimension
4
How to Determine VC Dimension for a Given Classier or Hypothesis?
5
References
Dr. Varun Kumar Lecture 18 2 / 10
General classication problem
1
Always look for test error along with the training error.
2
Improving on training error does not improve the test error.
3
Increase in machine capacity may give the poor performance.
Is there any equation that relates the training and test error ?
Dr. Varun Kumar Lecture 18 3 / 10
Usage of VC dimension in ML
Model complexity determines the performance/cost on both the training
and test sets.
P
Test errorTraining error +
r
h(log(2N=h) + 1)log=4
N
= 1
Note:Above expression shows the upper bound of test error with
probability 1.
h!VC dimension
h measure the power
h does not depend on the choice of training set
N!Total number of training sample
For reducing the residual,h!low orN!high
Test errorTraining error + Penalty(Complexity)
.
Dr. Varun Kumar Lecture 18 4 / 10
Continued{
)Let us our training data are iid from some distributionfX(x).
)Types of risk
(i) R()!Long term observation!Test observation
R() = Test error =E[(c6= ^c(x;))]
(ii) R
emp
()!Finite sample observation!Training
observation
R
emp
() = Training error =
1
m
X
i
[(c
(i)
6= ^c
(i)
(x;))]
Dr. Varun Kumar Lecture 18 5 / 10
Introduction to Vapnik-Chervonenkis (VC) Dimension
Key features:
)VC dimension is a measure of the capacity (complexity, expressive
power, richness, or exibility) of a set of functions.
)It learns by a statistical binary classication algorithm.
)It is dened as the cardinality of the largest set of points that the
algorithm can shatter.
Cardinality refers to the size of set. Ex-A=f1;4;6g, cardinality
jAj= 3
)The capacity of a classication model is related to how complicated it
can be.!Overtting
VC dimension of a set-family
LetHbe a set family (a set of sets) andCa set.
H\C:=fh\Cjh2Hg:
Dr. Varun Kumar Lecture 18 6 / 10
Relationship between risk and model complexity
Dr. Varun Kumar Lecture 18 7 / 10
How to determine VC dimension for a given classier or hypothesis?
1 General point setting:
Statement: ndimensional feature space a set ofmpoints (m>n) is
in general position if and only if no subset of (m+ 1) points lie on the
(n1) dimensional hyperplane.
Dr. Varun Kumar Lecture 18 8 / 10
2 Shattering:
Statement:A hypothesisHshattermpoints inndimensional space if
all possible combinations ofmpoints inndimensional space are
correctly classied.
Dr. Varun Kumar Lecture 18 9 / 10
References
E. Alpaydin,Introduction to machine learning. MIT press, 2020.
T. M. Mitchell,The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus,Data science from scratch: rst principles with python. O'Reilly Media,
2019.
Dr. Varun Kumar Lecture 18 10 / 10