Recent advances in LVQ

About This Presentation

Title:

Recent advances in LVQ

Description:

Theory of online learning uses techniques of theoretical physics ... Dotted: optimum linear decision Dashed: LVQ2.1 with idealized stopping. Solid: LVQ1 Chain: LFM ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 54

Provided by: sebastia3

Category:

more less

Transcript and Presenter's Notes

Title: Recent advances in LVQ

1
Recent advances in LVQ

Politechnika Clausthal
Instytut Informatyki
Adres
Ul. Julius-Albert 4
38678 Clausthal Zellerfeld
Niemcy
Tel 0049 5323 /72 - 7100
Email info_at_in.tu-clausthal.de

Barbara Hammer Instytut Informatyki hammer_at_in.tu-c
lausthal.de
2

Introduction
AI and ML
LVQ
Mathematical background
Formal analysis
Foundation by means of a cost function
Metric adaptation
Relevance learning
Matrix LVQ
General metric

3
Introduction
4
AI and ML
5
AI and ML
6
AI and ML
Challenges
Evolution of Solutions
Machine Learning
Statistical ML / Pattern Recognition
Unsupervised
Supervised
7
LVQ
8
Protoype based methods

LVQ - network
solution represented by prototypes within the
data space
classification given by the receptive fields

9
Protoype based methods

LVQ I training
init randomly
repeat
present training data
determine closest prototype
move it towards/away from the data depending on
the class

10
Protoype based methods

LVQ 2.1 training
init randomly
repeat
present training data
determine closest correct/wrong prototype
move it towards/away from the data

11
LVQ

LVQ1
adapt the next prototype wj ?(xi-wj)
depending on the class
LVQ 2.1
adapt the next correct prototype w ?(xi-w)
and the next incorrect prototype w- - ?(xi-w-)
possibly restrict the adaptation to a window
Learning from mistakes
Perform LVQ2.1 only if data point is
misclassified
and further variants

12
Mathematical background
13
Theory of online learning
14
Theory of online learning

Theory of online learning uses techniques of
theoretical physics
exact investigation of the behavior
in terms of characteristic quantities
for typical model situations
in the limit of infinite data dimensions (because
the theory becomes nice)

(pgt p-)
(p- )
15
Theory of online learning

Model situation
mixture of two N-dimensional Gaussians
data i.i.d.
orthonormal centers
priors p p-
two prototypes

16
Theory of online learning

Strategy
describe the update rules in terms of few
characteristic quantities (here projection of
prototypes to the relevant two dimensions,
correlation) such that random data point occurs
only within dot products
average over the data points, sum with N 8 ?
completely characterized by mean and variance
self-averaging properties of the characteristic
quantities the variance vanishes for N 8
choose learning rate as ? / N, continuous
learning time ? dynamic described by
deterministic ODEs
express the generalization error in terms of
characteristic quantities

17
Theory of online learning

LVQ1

18
Theory of online learning
learning curve
(p0.2, l1.2)
?1.2
?1.2
19
Theory of online learning

LVQ2.1

(pgt p-)
(p- )
20
Theory of online learning
LFM
21
Theory of online learning

Comparison

Equal variance
Unequal variance
Dotted optimum linear decision Dashed
LVQ2.1 with idealized stopping Solid LVQ1
Chain LFM
22
Foundation by means of a cost function
23
Cost function

function class F given by possible LVQ-networks
training data (xi,yi) ? machine learner ?
LVQ-function f in F
often f(xi) yi for training points (i.e. small
empirical error)
desired P(f(x) y) should be large (i.e. small
real error)

24
Cost function
safe classification
insecure classification

(hypothesis) margin of xi m(xi) d- - d where
d / d- is the squared distance to closest
correct / wrong prototype
mathematics ? error is bounded by
E/m O( p2(B3(ln 1/d)1/2) / (?m1/2))
where E number of misclassified training
data with margin smaller than ? (including
errors)
d confidence
m number of examples, B
support, p number of prototypes

data with (too) small margin
term / margin

does not include dimensionality
good bounds for few training errors and large
margin
25
Cost function

mathematical objective

maximize margin
26
Cost function

mathematical objective

maximize Si (d-(xi) - d(xi))
27
Cost function

mathematical objective

unbounded
minimize Si (d(xi) d-(xi))
28
Cost function

mathematical objective

minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
29
Cost function

mathematical objective min Si (d (xi)
d-(xi)) / (d(xi) d-(xi))

derivatives
scaling
LVQ2.1
30
Metric adaptation
31
Relevance learning
32
Relevance learning

mathematical objective

euclidean metric sensitive to noise, scaling,
minimize Si (d(xi) d-(xi)) / (d(xi)
d-(xi))
33
Relevance learning

mathematical objective

minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) Sl?l(xl-yl)2
relevance learning
34
Relevance learning

mathematical objective min Si (d? (xi)
d?-(xi)) / (d?(xi) d?-(xi))

derivatives
scaling
LVQ2.1
relevance update
intuitive, fast, well founded, flexible, suited
for large dimensions
35
Relevance learning
noise 1N(0.05), 1N(0.1),1N(0.2),1N(0.5),U(0.5
),U(0.2),N(0.5),N(0.2)
36
Application clinical proteomics
Relevance learning
unhappy because possibly ill ..
put into mass spectrometer
take serum
observe a characteristic spectrum which tells us
more about the molecules in the serum
37
Relevance learning

prostate cancer National Cancer Institute,
Prostate Cancer Dataset, www.cancer.gov, 2004l
318 examples, SELDI-TOF from blood serum, 130 dim
after preprocessing (normalization, peak
detection)
2 classes (healthy versus cancer in different
states)

potential biomarkers
38
Matrix learning
39
Matrix learning
GMLVQ can be applied locally (one matrix per
prototype) / globally (one matrix)
40
Matrix learning
41
Matrix learning
42
General metrics
43
General metrics

mathematical objective

minimize Si (d? (xi) d?-(xi)) / (d?(xi)
d?-(xi))
where d?(x,y) can be an arbitrary differentiable
dissimilarity
44
General metrics

Online-detection of faults for piston-engines

45
General metrics

Detection based on heterogeneous data

time dependent signals from sensors measuring
pressure and oscillation, process
characteristics, characteristics of the pV
diagramm,
sensors
46
General metrics

Data
ca. 30 time series with 36 entries per series
ca. 20 values from a time interval
ca. 40 global features
ca. 15 classes, ca. 100 training patterns

similarity measure
47
General metrics

Splicing for higher eucariotes

copy of DNA
branch site
A64G73G100T100G62A68G84T63
C65A100G100
reading frames
18-40 bp pyrimidines, i.e. T,C
donor
acceptor

ATCGATCGATCGATCGATCGATCGATCGAGTCAATGACC

no
yes
48
General metrics

IPsplice (UCI) human DNA, 3 classes, ca.3200
points, window size 60, old
C.elegans (Sonneburg et al.) only
acceptor/decoys, 1000/10000 training examples,
10000 test examples, window size 50, decoys are
close to acceptors
GRLVQ with few (8 resp. 5 per class) prototypes
LIK-similarity

local correlations
49
General metrics

IPsplice

50
General metrics

C.elegans

SVM with competitive kernel
.. GRLVQ yields sparser solutions, we are orders
of magnitude faster and get intuitive results ?
51
Handwritten digit recognition
General metrics
52
General metrics

Recent advances in LVQ - PowerPoint PPT Presentation

Recent advances in LVQ

Theory of online learning uses techniques of theoretical physics ... Dotted: optimum linear decision Dashed: LVQ2.1 with idealized stopping. Solid: LVQ1 Chain: LFM ... – PowerPoint PPT presentation