An Exact Approach for Learning Vector Quantization - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

An Exact Approach for Learning Vector Quantization

Description:

An Exact Approach for. Learning Vector Quantization. Aree Witoelar. Michael Biehl ... Heuristics, no cost function explicitly related to generalization error ... – PowerPoint PPT presentation

Number of Views:441

Avg rating:3.0/5.0

Slides: 32

Provided by: Owne1028

Category:

more less

Transcript and Presenter's Notes

Title: An Exact Approach for Learning Vector Quantization

1
An Exact Approach forLearning Vector Quantization

Aree Witoelar
Michael Biehl
University of Groningen

Barbara Hammer Clausthal University of Technology
2
Outline

Learning Vector Quantization
Mathematical treatment
Performance of LVQ algorithms
LVQ1, LVQ 2.1
Unsupervised algorithms
Winner Takes All, Neural Gas
Summary, Outlook

3
Learning Vector Quantization

Classification of data using
prototype vectors

Assign data to nearest prototype vector
4
Online learning
Sequence of independent random data
according to
Described by given LVQ algorithm
Update of prototype(s)
learning rate, step size
strength, direction of update etc.
move prototype towards currentdata
Euclidean
5
Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to
Projected to Span(B1,B2)
Projected to Span(v1,v2) v1, v2 random
vectors Not separable
Separable in (at most) M dimensions
6
Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to

For cluster k
classes sk 1, -1
prior prob. pk
S pk 1
offset of center from origin
Distance lk
Orientation Bk Bk1
variance vk

v2
2
1
3
l3
B1
4
v5
B5
5
7
Mathematical analysis
8
Mathematical analysis
In the thermodynamic limit N ? 8 ...

the projections
become correlated Gaussian quantities (Central
Limit Theorem)

9
Mathematical analysis
3. Derive ordinary differential equations
closed in

4. Solve for Rss(a), Qst(a)
dynamics/asymptotic behavior (a ? 8)
generalization error
sensitivity to initial conditions, learning
rates, structure of data

10
Results
11
Supervised algorithms
LVQ 1
LVQ 2.1
Update winner (closest prototype) towards/away
from data
Update two closest prototypes wJ,wK if cJ ? cK
and ?µ falls inside a window
? window parameter, 0lt ? 1
ws winner
1
T Heaviside function
12
Supervised algorithms
LVQ /-
LVQ1
update winner (closest prototype) towards/away
from data
update two closest prototype if their classes are
different
ws winner
1
T Heaviside function
wJ, wK two closest prototypes
13
LVQ1
Order parameters
A simple model 2 prototypes, 2 clusters
p10.7, p20.3, v1v21,l1 l21, B1B20
14
Learning rate
(Asymptotic) Performance depends on learning rates
eg
?2
?1
?0.5
a
eg
??0
Optimaldecisionboundary
15
LVQ1, 3 prototypes

Does adding more prototypes
always increase performance?

Generalization error
eg
l1l21.0, p10.7, p20.3,v10.81 v20.25
16
Asymptotics
eg vs ?
eg vs p
eg
p
p p-0.5, ?- 1
? gt?- (?0.81, ?- 0.25)
Optimal class assignment is not dependent on p
Optimal assignment place more prototypes on
class with larger variance
17
Multiple prototypes
Many order parameters to observe but more
problems
18
Computing averages lt.gt
Integrations in N dim.
LVQ algorithm
y2
S2(1-dim.) Gaussian integration? analytical
S3(2-dim.) Gaussian integration? numerical
y1
19
More complex models
1 1 prototypes
eg 0.1802
There exists an optimal number of prototypes for
LVQ1
20
LVQ 2.1

Without window, prototypes diverge because of
repulsion
But for very high N, window does not work
Alternatives
Early stopping
Alternative window forN? 8

always lt s
1lt s 8
21
LVQ 2.1 vs LVQ1
eg

Window can improve performance (with right
parameter d)

22
Unsupervised algorithms

Clustering problem
Minimize quantization error E

distance to nearest prototype
Minimize quantizationerror directly
Less sensitiveto initialization?
23
Sensitivity to initialization

For an initialization far away from the cluster
means

Neural Gas
RS2
E(W)
RS1

WTA
some prototypes rarely wins

Neural Gas
less prototypes get left behind

24
Robustness
Winner Takes All
Neural Gas
More robust with respect to initial conditions
Asymptotic configuration is sensitive to initial
conditions
25
Global minima

However, Neural Gas does not guarantee to find
global minima

p30.51
(here initialized at global minima) E8 lt E0
p1p20.49
v1v2v31
26
Summary

An exact approach to investigate typical learning
behavior of LVQ algorithms for certain data
structures, eg.
LVQ1
Optimal number of prototypes exists, depends on
the structure of data
More prototypes at class with larger variance
LVQ2.1
Windows can slow down divergence
Good performance with the right parameter
Neural Gas (vs. Winner Takes All)
Better for initialization
More robust w.r.t. initial conditions
Does not guarantee optimal solution

27
Outlook

Analysis for more prototypes allows more general
study of LVQ algorithms
Extensions
LVQ 2.1, Generalized LVQ, Robust Soft LVQ
Self Organising Maps
Offline (batch) learning
An optimal LVQ algorithm, regardless of data
structure, number of prototypes, etc.?

28
The End

Thank you for your attention

29
Central Limit Theorem
Joint density xh1,hS, b, b- becomes
corellated Gaussian quantities as N ? 8
frequency
frequency
Monte Carlo simulations N 100 over 10000 data
points
30
Validity

Good comparison with Monte Carlo Simulations with
same parameters (N200)

31
Self averaging
Fluctuations decreases with larger degree of
freedom N
At N?8, fluctuations vanish (variance becomes
zero)
Monte Carlo simulations over 100 independent runs
32
Optimal class assignment, LVQ1 3 prototypes
unequal variance ?gt?-
Best decision boundary hyperquadric where
C1,1,-1
optimal
C1,-1,-1
More prototypes can produce lower performance

Write a Comment

User Comments (0)