Rigorous Learning Curve Bounds from Statistical Mechanics - PowerPoint PPT Presentation

About This Presentation

Title:

Rigorous Learning Curve Bounds from Statistical Mechanics

Description:

Experimental learning curves fit a variety of functional forms, including exponentials. Curves analyzed using statistical mechanics methods, experience phase ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 42

Provided by: csHu

Category:

more less

Transcript and Presenter's Notes

Title: Rigorous Learning Curve Bounds from Statistical Mechanics

1
Rigorous Learning Curve Bounds from Statistical
Mechanics

D. Haussler, M. Kearns,
H. S. Seung, N. Tishby

Presentation Talya Meltzer
2
Motivation

According to the VC-theory, minimizing the
empirical error within a function class F on a
random sample will lead to generalization error
bounds
Realizable case
Unrealizable case
The VC-bounds are the best distribution-independen
t upper bounds

3
Motivation

Yet, these bounds are vacuous for mltd
And fail to capture the true behavior of
particular learning curves
Experimental learning curves fit a variety of
functional forms, including exponentials
Curves analyzed using statistical mechanics
methods, experience phase transitions (sudden
drops in the generalization error)

4
Main Ideas

Decompose the hypothesis class into error
shells
Attribute each hypothesis the correct
generalization error, while taking the specific
distribution into account
Use the thermodynamic limit method
Notate the correct scale at which to analyze a
learning curve
Express the learning curve as a competition
between an entropy function and an energy function

5
Overview The PAC Learning Model

The hypothesis class
Input
Assumptions
The examples in the training set S are sampled
i.i.d according to a distribution D over X
D is unknown
D is fixed throughout the learning process
There exists a target function fX?Y, i.e. yi
f(xi)
Goal find the target function

6
Overview The PAC Learning Model

Training (empirical) error
Generalization error
The class F is PAC-learnable, if there exists a
learning algorithm which given e,d returns h?F
such that
The training error is minimal

7
The Finite Realizable Case

The version space
The e-ball
If B(e) includes VS(S), then any function in the
version space has generalization error e

8
The Finite Realizable Case
9
Decomposition into error shells
In a finite class, there is a finite number of
possible error values 0 e1 lt e2 lt lt er
1 rFlt8
10
Decomposition into error shells
So, we can replace the union bound in the exact
phrase
Now, with probability at least 1-d, any h
consistent with the sample obeys
To understand the behavior of this bound, we will
use the thermodynamic limit method
11
The Thermodynamic Limit

We consider an infinite sequence of classes of
functions F1,F2,,FN,
FN f XN ? 0,1 , Nlog2(FN)
We are often interested in a parametric class of
functions
The number of functions in the class at any given
error value may have a limiting asymptotic
behavior, as the number of parameters grows

12
The Thermodynamic Limit

Rewrite the expression
Notate the scaling function t(N) when chosen
properly, captures the scale at which the
learning curve is most interesting
Find a permissible entropy bound tightly
captures the behavior of

The entropy of the j-th error shell POSITIVE
The minus energy of the j-th error shell
NEGATIVE
13
The Thermodynamic Limit

Formal definitions
t(N) a mapping from the natural numbers to the
natural numbers, such that
s(e) a continuous function
s(e) is called a permissible entropy bound if
there exists a natural number N0 such that for
all N N0 and for all 1jr(N)

14
The Thermodynamic Limit
am/t(N) remains const, as m,N?8 a controls the
competition between the entropy and the energy
15
The Thermodynamic Limit

In order to describe infinite systems
We describe a system in finite size, then let the
size grow to infinity
We normalize extensive variables by the volume
We keep the density fixed ? N/V const, as
N,V ? 8

16
The Thermodynamic Limit
The Learning System vs. The Thermodynamic System
17
The Thermodynamic Limit

Benefits N isolated in the factor t(N), and the
remaining factor is the continuous function
Define as the largest such that
In the thermodynamic limit, under certain
conditions, we can bound the generalization error
of any consistent hypothesis by

18
The Thermodynamic Limit
We will see that for egte the thermodynamic limit
of the sum is 0. Let 0ltt1 be an arbitrarily
small quantity
19
The Thermodynamic Limit
The limit will be indeed zero, provided that
r(N)o(expt(N)?)
20
The Thermodynamic Limit

Summary
e is the rightmost crossing point of s(e) and
-aln(1-e)
in the thermodynamic limit, any hypothesis h
consistent with m at(N) examples will have
egen(h) e t (with probability 1).

21
Scaled Learning Curves

Extracting scaled learning curves
Let the value of a vary
Apply the thermodynamic limit method to each
value
Plot the generalization error bound as a function
of a (instead of m ? scaled)

22
Artificial Examples
Using weak permissible entropy bound for some
scaling function t(N)
s(e)1
23
Artificial Examples
Using single-peak permissible entropy bound
24
Artificial Examples
Using different single-peak as a permissible
entropy bound
25
Artificial Examples
Using double-peak permissible entropy bound
26
Phase Transitions

The sudden drops in the learning curves are
called phase transitions
In thermodynamic systems, a phase transition is
the transformation from one phase to another
A critical point is the conditions (such as
temperature, pressure) at which the transition
occur

27
Phase Transitions
Well known phase transitions solid to liquid,
liquid to gas...
28
Phase Transitions more
29
Phase Transitions Learning

In some learning curves, we see a transition from
a finite generalization error to perfect learning
The transition occur in a critical a, i.e. when
the sample reaches the size of m aCt(N)
In this critical point the system realizes the
problem at once

30
(Almost) Real Examples The Ising Perceptron
fN arbitrary target function, defined by w0
31
(Almost) Real Examples The Ising Perceptron
Due to the spherically symmetric distribution
The number of perceptrons with hamming distance j
from the target
32
(Almost) Real Examples The Ising Perceptron
Weve seen this entropy bound as the single-peak

The phase transition to perfect learning occur
in aC1.448
The critical m for perfect learning according to
both the VC and cardinality bounds, is
, rather than

33
(Almost) Real Examples The Ising Perceptron

The right zero crossing yields the upper bound on
the generalization error
With high probability, there are no hypotheses in
VS(S) with error less than the left zero crossing
except for the target itself
VS(S) minus the target is contained within these
zero crossings

34
The Thermodynamic LimitLower Bound

The thermodynamic limit method can provide a
lower bound to the generalization error
The lower bound shows that the behavior examined
in scaled learning curve, including phase
transitions, can actually occur for certain
function classes and distributions
We will use the energy function 2ae
The qualitative behavior of the curves obtained
by intersecting with 2ae and -aln(1-e) is
essentially the same

35
The Thermodynamic LimitLower Bound

We can construct
a function class sequence FN over XN
a distribution sequence DN over XN
a target function sequence fN

such that
s(e) is a permissible entropy bound with respect
to t(N)N
for the largest e½ for which 2aes(e), there
is a constant probability to find a consistent
hypothesis with egen(h)e
? e is a lower bound on the worst consistent
hypothesis

36
The Finite Unrealizable Case

The data can be labeled according to a function
not within our class
Or sampled by a distribution DN over XN0,1,
which can also model noise in the examples
Use u(e) as a permissible energy bound, if for
any h in F and any sample size m

for the realizable case we had and the exact
equality
37
The Finite Unrealizable Case

We can always choose
(and in certain cases we can do better)
The standard cardinality bound obtained
Since the class is finite, we can slice it into
error shells and apply the thermodynamic limit,
just as in the realizable case.
Choosing e to be the rightmost intersection of
s(e) and au(e), we get for any tgt0

38
The Infinite Case

The covering approach build a finite ?-cover,
F?, to the infinite class ? emin(?)?
Apply the thermodynamic limit by building a
sequence of nested covers
Result a bound on the error of e?, the rightmost
crossing function of s?(e) and au?(e)
Trade-off
The best error achievable in the chosen cover
F? improves as ??0
The size of F? increases as ??0

39
Real World Example
Sufficient Dimensionality Reduction with
Irrelevance Statistics A. Globerson, G. Chechik,
N. Tishby

In this example
Main data images of all men with neutral face
expression and light either from the right or the
left
Irrelevance data similarly created with female
images

40
Real World Example
41
Summary

Benefits of the method
Derives tighter bounds
Allows to describe the behavior for small samples
as well ? useful in practice, where we want to
work with md
Captures the phase transitions in learning
curves, including transitions to perfect
learning, which can actually occur
experimentally in certain problems
Further work to be done
Refined extensions to the infinite case