Title: An Exact Approach for Learning Vector Quantization
1An Exact Approach forLearning Vector Quantization
- Aree Witoelar
- Michael Biehl
- University of Groningen
Barbara Hammer Clausthal University of Technology
2Outline
- Learning Vector Quantization
- Mathematical treatment
- Performance of LVQ algorithms
- LVQ1, LVQ 2.1
- Unsupervised algorithms
- Winner Takes All, Neural Gas
- Summary, Outlook
3Learning Vector Quantization
- Classification of data using
- prototype vectors
Assign data to nearest prototype vector
4Online learning
Sequence of independent random data
according to
Described by given LVQ algorithm
Update of prototype(s)
learning rate, step size
strength, direction of update etc.
move prototype towards currentdata
Euclidean
5Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to
Projected to Span(B1,B2)
Projected to Span(v1,v2) v1, v2 random
vectors Not separable
Separable in (at most) M dimensions
6Model
Mixture of M - spherical (Gaussian) clusters of
high dimensional data
Random vectors ? ? RN according to
- For cluster k
- classes sk 1, -1
- prior prob. pk
- S pk 1
- offset of center from origin
- Distance lk
- Orientation Bk Bk1
- variance vk
v2
2
1
3
l3
B1
4
v5
B5
5
7Mathematical analysis
8Mathematical analysis
In the thermodynamic limit N ? 8 ...
- the projections
- become correlated Gaussian quantities (Central
Limit Theorem)
9Mathematical analysis
3. Derive ordinary differential equations
closed in
- 4. Solve for Rss(a), Qst(a)
- dynamics/asymptotic behavior (a ? 8)
- generalization error
- sensitivity to initial conditions, learning
rates, structure of data
10Results
11Supervised algorithms
LVQ 1
LVQ 2.1
Update winner (closest prototype) towards/away
from data
Update two closest prototypes wJ,wK if cJ ? cK
and ?µ falls inside a window
? window parameter, 0lt ? 1
ws winner
1
T Heaviside function
12Supervised algorithms
LVQ /-
LVQ1
update winner (closest prototype) towards/away
from data
update two closest prototype if their classes are
different
ws winner
1
T Heaviside function
wJ, wK two closest prototypes
13LVQ1
Order parameters
A simple model 2 prototypes, 2 clusters
p10.7, p20.3, v1v21,l1 l21, B1B20
14Learning rate
(Asymptotic) Performance depends on learning rates
eg
?2
?1
?0.5
a
eg
??0
Optimaldecisionboundary
15LVQ1, 3 prototypes
- Does adding more prototypes
- always increase performance?
Generalization error
eg
l1l21.0, p10.7, p20.3,v10.81 v20.25
16Asymptotics
eg vs ?
eg vs p
eg
p
p p-0.5, ?- 1
? gt?- (?0.81, ?- 0.25)
Optimal class assignment is not dependent on p
Optimal assignment place more prototypes on
class with larger variance
17Multiple prototypes
Many order parameters to observe but more
problems
18Computing averages lt.gt
Integrations in N dim.
LVQ algorithm
y2
S2(1-dim.) Gaussian integration? analytical
S3(2-dim.) Gaussian integration? numerical
y1
19More complex models
1 1 prototypes
eg 0.1802
There exists an optimal number of prototypes for
LVQ1
20LVQ 2.1
- Without window, prototypes diverge because of
repulsion - But for very high N, window does not work
- Alternatives
- Early stopping
- Alternative window forN? 8
always lt s
1lt s 8
21LVQ 2.1 vs LVQ1
eg
- Window can improve performance (with right
parameter d)
22Unsupervised algorithms
- Clustering problem
- Minimize quantization error E
distance to nearest prototype
Minimize quantizationerror directly
Less sensitiveto initialization?
23Sensitivity to initialization
- For an initialization far away from the cluster
means
Neural Gas
RS2
E(W)
RS1
- WTA
- some prototypes rarely wins
- Neural Gas
- less prototypes get left behind
24Robustness
Winner Takes All
Neural Gas
More robust with respect to initial conditions
Asymptotic configuration is sensitive to initial
conditions
25Global minima
- However, Neural Gas does not guarantee to find
global minima
p30.51
(here initialized at global minima) E8 lt E0
p1p20.49
v1v2v31
26Summary
- An exact approach to investigate typical learning
behavior of LVQ algorithms for certain data
structures, eg. - LVQ1
- Optimal number of prototypes exists, depends on
the structure of data - More prototypes at class with larger variance
- LVQ2.1
- Windows can slow down divergence
- Good performance with the right parameter
- Neural Gas (vs. Winner Takes All)
- Better for initialization
- More robust w.r.t. initial conditions
- Does not guarantee optimal solution
27Outlook
- Analysis for more prototypes allows more general
study of LVQ algorithms - Extensions
- LVQ 2.1, Generalized LVQ, Robust Soft LVQ
- Self Organising Maps
- Offline (batch) learning
- An optimal LVQ algorithm, regardless of data
structure, number of prototypes, etc.?
28The End
- Thank you for your attention
29Central Limit Theorem
Joint density xh1,hS, b, b- becomes
corellated Gaussian quantities as N ? 8
frequency
frequency
Monte Carlo simulations N 100 over 10000 data
points
30Validity
- Good comparison with Monte Carlo Simulations with
same parameters (N200)
31Self averaging
Fluctuations decreases with larger degree of
freedom N
At N?8, fluctuations vanish (variance becomes
zero)
Monte Carlo simulations over 100 independent runs
32Optimal class assignment, LVQ1 3 prototypes
unequal variance ?gt?-
Best decision boundary hyperquadric where
C1,1,-1
optimal
C1,-1,-1
More prototypes can produce lower performance