Title: Vapnik-Chervonenkis Dimension
1Vapnik-Chervonenkis Dimension
- Definition and Lower bound
- Adapted from Yishai Mansour
2PAC Learning model
- There exists a distribution D over domain X
- Examples ltx, c(x)gt
- use c for target function (rather than ct)
- Goal
- With high probability (1-d)
- find h in H such that
- error(h,c ) lt e
- e arbitrarily small.
3VC Motivation
- Handle infinite classes.
- VC-dim replaces finite class size.
- Previous lecture (on PAC)
- specific examples
- rectangle.
- interval.
- Goal develop a general methodology.
4 The VC Dimension
- C collection of subsets of universe U
- VC(C) VC dimension of C
- size of largest subset T ? U shattered by C
- T shattered if every subset T?T expressible as
- T ? (an element of C)
- Example
- C a, a, c, a, b, c, b, c, b
- VC(C) 2 b, c shattered by C
- Plays important role in learning theory, finite
automata, comparability theory, computational
geometry
5Definitions Projection
- Given a concept c over X
- associate it with a set (all positive examples)
- Projection (sets)
- For a concept class C and subset S
- PC(S) c ? S c ? C
- Projection (vectors)
- For a concept class C and S x1, , xm
- PC(S) ltc(x1), , cxm)gt c ? C
6Definition VC-dim
- Clearly PC(S) ? 2m
- C shatters S if PC(S) 2m
- (S is shattered by C)
- VC dimension of a class C
- The size d of the largest set S that shatters C.
- Can be infinite.
- For a finite class C
- VC-dim(C) ? log C
7Example S is Shattered by C
VC A combinatorial measure of a function class
complexity
8Calculating VC dimensionality
- The VC dimension is at least d if there exists
some sample S d which is shattered by C. - This does not mean that all samples of size d
are shattered by C. (Three point on a single
line in 2d) - Conversely, in order to show that the VC
dimension is at most d, one must show that no
sample of size d 1 is shattered. - Naturally, proving an upper bound is more
difficult than proving the lower bound on the VC
dimension.
9Example 1 Interval
1
0
C1cz z ? 0,1 cz(x) 1 ? x ? z
10Example 2 line
C2cw w(a,b,c) cw(x,y) 1 ? axby ? c
11Line Hyperplane VC dim gt 3
12VC dim lt 44 points can not be shattered
13Example 3 Parallel Rectangle
14VC Dim of Rectangles
15Example 4 Finite union of intervalsAny set of
points can be covered Thus VC dim
16Example 5 Parity
- n Boolean input variables
- T ? 1, , n
- fT(x) ?i?T xi
- Lower bound n unit vectors
- Upper bound
- Number of concepts
- Linear dependency
17Example 6 OR
- n Boolean input variables
- P and N subsets 1, , n
- fP,N(x) (? i?P xi) ? (? i?N ? xi)
- Lower bound n unit vectors
- Upper bound
- Trivial 2n
- Use ELIM (get n1)
- Show second vector removes 2 (get n)
18Example 7 Convex polygons
19Example 7 Convex polygons
20Example 8 Hyper-plane
C8cw,c w??d cw,c(x) 1 ? ltw,xgt ? c
- VC-dim(C8) d1
- Lower bound
- unit vectors and zero vector
- Upper bound!
21 Complexity Questions
- Given C, compute VC(C)
- since VC(C) ? log C, can compute in O(nlog n)
time - (Linial-Mansour-Rivest 88)
- probably cant do better problem is LOG
NP-complete - (Papadimitriou-Yannakakis 96)
- Often C has a small implicit representation
- C(i, x) is a polynomial-size circuit such that
- C(i, x) 1 iff x belongs to set i
- implicit version is ?3-complete (Schaefer 99)
- (as hard as ?a?b?c ?(a, b, c) for CNF formula ?)
22Sampling Lemma
Lemma Let W X be chosen randomly such that W
eX. A set of O(1/e ln(1/d)) points sampled
independently and uniformly at random from X
intersects W with probability at least (1-
d) Proof Any sample x is in W with probability
at least e. Thus, the probability that all
samples do not intersect with W is at most d
23e-Net Theorem
Theorem Let VC-dimension of (X,C) be d 2 and 0
e ½. e-net for (X,C) of size at most O(d/e
ln(1/e)). If we choose O(d/e ln(d/e) 1/e
ln(1/d)) points at random from X, then the
resulting set N is an e-net with probability d.
Exercise 3, Submission next week
A polynomial bound on the sample size for PAC
learning
24Radon Theorem
- Definitions
- Convex set.
- Convex hull conv(S)
- Theorem
- Let T be a set of d2 points in Rd
- There exists a subset S of T such that
- conv(S) ? conv(T \ S) ??
- Proof!
25Hyper-plane Finishing the proof
- Assume d2 points T can be shattered.
- Use Radon Theorem to find S such that
- conv(S) ? conv(T \ S) ??
- Assign point in S label 1
- points not in S label 0
- There is a separating hyper-plane
- How will it label conv(S) ? conv(T \ S)
26Lower bounds Setting
- Static learning algorithm
- asks for a sample S of size m(e,d)
- Based on S selects a hypothesis
27Lower bounds Setting
- Theorem
- if VC-dim(C) ? then C is not learnable.
- Proof
- Let m m(0.1,0.1)
- Find 2m points which are shattered (set T)
- Let D be the uniform distribution on T
- Set ct(xi)1 with probability ½.
- Expected error ¼.
- Finish proof!
28Lower Bound Feasible
- Theorem
- VC-dim(C)d1, then m(e,d)W(d/e)
- Proof
- Let T be a set of d1 points which is shattered.
- D samples
- z0 with prob. 1-8e
- zi with prob. 8e/d
29Continue
- Set ct(z0)1 and ct(zi)1 with probability ½
- Expected error 2e
- Bound confidence
- for accuracy e
30Lower Bound Non-Feasible
- Theorem
- For two hypoth. m(e,d)W((log 1/d)/e2 )
- Proof
- Let Hh0, h1, where hb(x)b
- Two distributions
- D0 Prob. ltx,1gt is ½ - g and lty,0gt is ½ g
- D1 Prob. ltx,1gt is ½ g and lty,0gt is ½ - g