Learning to make specific predictions using Slow Feature Analysis - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Learning to make specific predictions using Slow Feature Analysis

Description:

... specific predictions using Slow Feature Analysis ... module design: Slow Feature Analysis (SFA) BUT... ....No ... Let feedback from above modify slow ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 27

Provided by: bills84

Category:

more less

Transcript and Presenter's Notes

Title: Learning to make specific predictions using Slow Feature Analysis

1
Learning to make specific predictions using Slow
Feature Analysis
2
Memory/prediction hierarchy with temporal
invariances
Slow temporally invariant abstractions
Fast quickly changing input
But how does each module work learn, map, and
predict?
3

My (old) module
Quantize high-dim input space
Map to low-dim output space
Discover temporal sequences in input space
Map sequences to low-dim sequence language
Feedback same map run backwards
Problems
Sequence-mapping (step 4) depends on several
previous
steps ? brittle, not robust
Sequence-mapping not well-defined statistically

4
New module design Slow Feature Analysis (SFA)

Pros of SFA
Nearly guaranteed to find some slow features
No quantization
Defined over entire input space
Hierarchical stacking is easy
Statistically robust building blocks (simple
polynomials, Principal Components Analysis,
variance reduction, etc)
? a great way to find invariant functions
? invariants change slowly, hence easily
predictable

BUT
.No feedback!
Cant get specific output from invariant input
Its hard to take a low-dim signal and turn it
into the right high-dim one (underdetermined)
Heres my solution (straightforward, probably
done before somewhere)
Do feedback with separate map

6
First, show it working then, show how why
Input space 20-dim retina Input shapes
Gaussian blurs (wrapped) of 3 different widths
Input sequences constant-velocity motion (0.3
pixels/step)
T 0 T2 T4
Pixel 21 pixel 1
T 23 T25 T27
7
Sanity-check slow features extracted match
generating parameters
Gaussian std dev.
What
Slow feature 1
Where
Gaussian center posn
Slow feature 2
( so far, this is plain vanilla SFA, nothing
new)
8
New contribution Predict all pixels of next
image, given previous images
T 0 T2 T4 T5 ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Reference prediction is to use previous image
(tomorrows weather is just like todays)
T4 T5 ?
9
Plot ratio
(mean-squared prediction error ) (mean-squared
reference error)
Reference prediction

Median ratio over all points 0.06 (including
discontinuities)
over high-confidence points 0.03 (toss worst
20)
10

Take-home messages
SFA can be inverted
SFA can be used to make specific predictions
The prediction works very well
The prediction can be further improved by using
confidence estimates
So why is it hard, and how is it done?....

Why its hard

Low-dim slow features S1 0.3 x1 0.1 x12
1.4 x2 x3 1.1 x42 . 0.5 x5 x9
easy
High-dim x1 x2 x3 ....x2
0
But given S1 1.4 S2 -0.33 x1
? x2? x3? x4? x5? x6? . . . x20?
HARD

Infinitely many possibilities of xs
Vastly under-determined
No simple polynomial-inverse formula (e.g.
quadratic formula)

Very simple, graphable example
(x1, x2) 2-dim ? S1 1-dim

S1(t) x12 x22 nearly constant, i.e. slow
x1(t), x2(t) approx circular motion in plane
Illustrate a series of six clue/trick pairs for
learning specific-prediction mapping
13

Clue 1 The actual input data is a small
subset of all possible input data (i.e. on a
manifold)

?
actual
possible
Trick 1 Find a set of points which represent
where the actual input data is
20-80 anchor points Ai
?
(Found using k-means, k-medoids, etc. This is
quantization, but only for feedback)
14

Clue 2 The actual input data is not
distributed evenly about those anchor-points

yes
no
Trick 2 Calculate covariance matrix Ci of data
around Ai
?
data
Eigenvectors of Ci
15
Clue 3 S(x) is locally linear about each
anchor point
?
Trick 3 Construct linear (affine)
Taylor-series mappings SLi approximating S(x)
about each Ai (NB this doesnt require
polynomial SFA, just differentiable)
16
Good news Linear SLi can be pseudo-inverted
(SVD) Bad news We dont want any old (x1,x2),
we want (x1,x2) on the data manifold

Clue 4 Covariance eigenvectors tell us about
the local data manifold

Trick 4
Get SVD pseudo-inverse DX SLi-1(Snew S(Ai))
Then stretch DX onto manifold by multiplying by
chopped Ci

Snew
DS
Stretched DX
S(Ai)
DX
DX
stretch
Projection matrix, keeping only as many
eigenvectors as dimensions of S
17

Good news Given Ai and Ci, we can invert Snew
? Xnew

Bad news How do we choose which Ai and SLi-1
to use?
?
?
These three all have the same value of Snew
?
18
Clue 5 a) We need an anchor Ai such that
S(Ai) is close to Snew
Snew
Close candidates
S(Ai)
b) Need a hint of which anchors are close in
X-space
Hint region

Trick 5 Choose anchor Ai such that
Ai is close to the hint AND
S(Ai) is close to Snew

All tricks together
Map local linear inverse about each anchor point

S(Ai) neighbors x
Anchors
20

Clue 6 The local data scatter can decide if a
given point is probable (on the manifold) or not

improbable
probable
Trick 6 Use Gaussian hyper-ellipsoid
probabilities about closest Ai (this can
tell if a prediction makes sense or not)
improbable
probable
21

Estimated uncertainty increases away from anchor
points

-log(P)
22
Summary of SFA inverse/prediction method

We have X(t-2), X(t-1), X(t) we want X(t1)

Calculate slow features
S(t-2), S(t-1), S(t)

t
2. Extrapolate that trend linearly to Snew (NB
S varies slowly/smoothly in time)
Snew
S
t
3. Find candidate S(Ai)s close to Snew
Snew
all S(Ai)
e.g. candidate i 1, 16, 3, 7
23
Summary contd

4. Take X(t) as hint, and find candidate Ais
close to it

e.g. candidate i 8, 3, 5, 17
5. Find best candidate Ai , whose index is
high on both candidate lists
S(Ai)s close to Snew Ai close to X(t)
i i
1 8
16 3
3 5
6 17
24
6. Use chosen Ai and pseudo-inverse (i.e.
SLi-1(Snew S(Ai) ) with SVD) to get DX
S(Ai)
DX
7. Stretch DX onto low-dim manifold using
chopped Ci
Stretched DX
DX
stretch
8. Add stretched DX back onto Ai to get final
prediction
Ai
Stretched DX
25
9. Use covariance hyper-ellipsoids to estimate
confidence in this prediction

This method uses virtually everything we know
about the data any improvements presumably would
need further clues
Discrete sub-manifolds
Discrete sequence steps
Better nonlinear mappings

26
Next steps