Title: Recent advances in LVQ
1Recent advances in LVQ
- Politechnika Clausthal
- Instytut Informatyki
- Adres
- Ul. Julius-Albert 4
- 38678 Clausthal Zellerfeld
- Niemcy
- Tel 0049 5323 /72 - 7100
- Email info_at_in.tu-clausthal.de
Barbara Hammer Instytut Informatyki hammer_at_in.tu-c
lausthal.de
2- Introduction
- AI and ML
- LVQ
- Mathematical background
- Formal analysis
- Foundation by means of a cost function
- Metric adaptation
- Relevance learning
- Matrix LVQ
- General metric
3Introduction
4AI and ML
5AI and ML
6AI and ML
Challenges
Evolution of Solutions
Machine Learning
Statistical ML / Pattern Recognition
Unsupervised
Supervised
7LVQ
8Protoype based methods
- LVQ - network
- solution represented by prototypes within the
data space - classification given by the receptive fields
9Protoype based methods
- LVQ I training
- init randomly
- repeat
- present training data
- determine closest prototype
- move it towards/away from the data depending on
the class
10Protoype based methods
- LVQ 2.1 training
- init randomly
- repeat
- present training data
- determine closest correct/wrong prototype
- move it towards/away from the data
11LVQ
- LVQ1
- adapt the next prototype wj ?(xi-wj)
depending on the class - LVQ 2.1
- adapt the next correct prototype w ?(xi-w)
- and the next incorrect prototype w- - ?(xi-w-)
- possibly restrict the adaptation to a window
- Learning from mistakes
- Perform LVQ2.1 only if data point is
misclassified - and further variants
12Mathematical background
13Theory of online learning
14Theory of online learning
- Theory of online learning uses techniques of
theoretical physics - exact investigation of the behavior
- in terms of characteristic quantities
- for typical model situations
- in the limit of infinite data dimensions (because
the theory becomes nice)
(pgt p-)
(p- )
15Theory of online learning
- Model situation
- mixture of two N-dimensional Gaussians
- data i.i.d.
- orthonormal centers
- priors p p-
- two prototypes
16Theory of online learning
- Strategy
- describe the update rules in terms of few
characteristic quantities (here projection of
prototypes to the relevant two dimensions,
correlation) such that random data point occurs
only within dot products - average over the data points, sum with N 8 ?
completely characterized by mean and variance - self-averaging properties of the characteristic
quantities the variance vanishes for N 8 - choose learning rate as ? / N, continuous
learning time ? dynamic described by
deterministic ODEs - express the generalization error in terms of
characteristic quantities
17Theory of online learning
18Theory of online learning
learning curve
(p0.2, l1.2)
?1.2
?1.2
19Theory of online learning
(pgt p-)
(p- )
20Theory of online learning
LFM
21Theory of online learning
Equal variance
Unequal variance
Dotted optimum linear decision Dashed
LVQ2.1 with idealized stopping Solid LVQ1
Chain LFM
22Foundation by means of a cost function
23Cost function
- function class F given by possible LVQ-networks
- training data (xi,yi) ? machine learner ?
LVQ-function f in F - often f(xi) yi for training points (i.e. small
empirical error) - desired P(f(x) y) should be large (i.e. small
real error)
24Cost function
safe classification
insecure classification
- (hypothesis) margin of xi m(xi) d- - d where
d / d- is the squared distance to closest
correct / wrong prototype - mathematics ? error is bounded by
-
E/m O( p2(B3(ln 1/d)1/2) / (?m1/2)) -
- where E number of misclassified training
data with margin smaller than ? (including
errors) - d confidence
- m number of examples, B
support, p number of prototypes
data with (too) small margin
term / margin
does not include dimensionality
good bounds for few training errors and large
margin
25Cost function
maximize margin
26Cost function
maximize Si (d-(xi) - d(xi))
27Cost function
unbounded
minimize Si (d(xi) d-(xi))
28Cost function
minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
29Cost function
- mathematical objective min Si (d (xi)
d-(xi)) / (d(xi) d-(xi))
derivatives
scaling
LVQ2.1
30Metric adaptation
31Relevance learning
32Relevance learning
euclidean metric sensitive to noise, scaling,
minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
33Relevance learning
minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) Sl?l(xl-yl)2
relevance learning
34Relevance learning
- mathematical objective min Si (d? (xi)
d?-(xi)) / (d?(xi) d?-(xi))
derivatives
scaling
LVQ2.1
relevance update
intuitive, fast, well founded, flexible, suited
for large dimensions
35Relevance learning
noise 1N(0.05), 1N(0.1),1N(0.2),1N(0.5),U(0.5
),U(0.2),N(0.5),N(0.2)
36Application clinical proteomics
Relevance learning
unhappy because possibly ill ..
put into mass spectrometer
take serum
observe a characteristic spectrum which tells us
more about the molecules in the serum
37Relevance learning
- prostate cancer National Cancer Institute,
Prostate Cancer Dataset, www.cancer.gov, 2004l - 318 examples, SELDI-TOF from blood serum, 130 dim
after preprocessing (normalization, peak
detection) - 2 classes (healthy versus cancer in different
states)
potential biomarkers
38Matrix learning
39Matrix learning
GMLVQ can be applied locally (one matrix per
prototype) / globally (one matrix)
40Matrix learning
41Matrix learning
42General metrics
43General metrics
minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) can be an arbitrary differentiable
dissimilarity
44General metrics
- Online-detection of faults for piston-engines
45General metrics
- Detection based on heterogeneous data
time dependent signals from sensors measuring
pressure and oscillation, process
characteristics, characteristics of the pV
diagramm,
sensors
46General metrics
- Data
- ca. 30 time series with 36 entries per series
- ca. 20 values from a time interval
- ca. 40 global features
- ca. 15 classes, ca. 100 training patterns
similarity measure
47General metrics
- Splicing for higher eucariotes
copy of DNA
branch site
A64G73G100T100G62A68G84T63
C65A100G100
reading frames
18-40 bp pyrimidines, i.e. T,C
donor
acceptor
- ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC
no
yes
48General metrics
- IPsplice (UCI) human DNA, 3 classes, ca.3200
points, window size 60, old - C.elegans (Sonneburg et al.) only
acceptor/decoys, 1000/10000 training examples,
10000 test examples, window size 50, decoys are
close to acceptors - GRLVQ with few (8 resp. 5 per class) prototypes
- LIK-similarity
local correlations
49General metrics
50General metrics
SVM with competitive kernel
.. GRLVQ yields sparser solutions, we are orders
of magnitude faster and get intuitive results ?
51Handwritten digit recognition
General metrics
52General metrics
- USPS data set 9298 patterns, 256 dimensions
- GRLVQ with correlation measure, 20 prototypes per
class
53Trap set by bear to catch hunter