Title: Learning to make specific predictions using Slow Feature Analysis
1Learning to make specific predictions using Slow
Feature Analysis
2Memory/prediction hierarchy with temporal
invariances
Slow temporally invariant abstractions
Fast quickly changing input
But how does each module work learn, map, and
predict?
3- My (old) module
- Quantize high-dim input space
- Map to low-dim output space
- Discover temporal sequences in input space
- Map sequences to low-dim sequence language
- Feedback same map run backwards
- Problems
- Sequence-mapping (step 4) depends on several
previous - steps ? brittle, not robust
- Sequence-mapping not well-defined statistically
4New module design Slow Feature Analysis (SFA)
- Pros of SFA
- Nearly guaranteed to find some slow features
- No quantization
- Defined over entire input space
- Hierarchical stacking is easy
- Statistically robust building blocks (simple
polynomials, Principal Components Analysis,
variance reduction, etc) - ? a great way to find invariant functions
- ? invariants change slowly, hence easily
predictable
5- BUT
- .No feedback!
- Cant get specific output from invariant input
- Its hard to take a low-dim signal and turn it
into the right high-dim one (underdetermined) - Heres my solution (straightforward, probably
done before somewhere) - Do feedback with separate map
6First, show it working then, show how why
Input space 20-dim retina Input shapes
Gaussian blurs (wrapped) of 3 different widths
Input sequences constant-velocity motion (0.3
pixels/step)
T 0 T2 T4
Pixel 21 pixel 1
T 23 T25 T27
7Sanity-check slow features extracted match
generating parameters
Gaussian std dev.
What
Slow feature 1
Where
Gaussian center posn
Slow feature 2
( so far, this is plain vanilla SFA, nothing
new)
8New contribution Predict all pixels of next
image, given previous images
T 0 T2 T4 T5 ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Reference prediction is to use previous image
(tomorrows weather is just like todays)
T4 T5 ?
9Plot ratio
(mean-squared prediction error ) (mean-squared
reference error)
Reference prediction
Median ratio over all points 0.06 (including
discontinuities)
over high-confidence points 0.03 (toss worst
20)
10- Take-home messages
- SFA can be inverted
- SFA can be used to make specific predictions
- The prediction works very well
- The prediction can be further improved by using
- confidence estimates
- So why is it hard, and how is it done?....
11Low-dim slow features S1 0.3 x1 0.1 x12
1.4 x2 x3 1.1 x42 . 0.5 x5 x9
easy
High-dim x1 x2 x3 ....x2
0
But given S1 1.4 S2 -0.33 x1
? x2? x3? x4? x5? x6? . . . x20?
HARD
- Infinitely many possibilities of xs
- Vastly under-determined
- No simple polynomial-inverse formula (e.g.
quadratic formula)
12- Very simple, graphable example
- (x1, x2) 2-dim ? S1 1-dim
S1(t) x12 x22 nearly constant, i.e. slow
x1(t), x2(t) approx circular motion in plane
Illustrate a series of six clue/trick pairs for
learning specific-prediction mapping
13- Clue 1 The actual input data is a small
subset of all possible input data (i.e. on a
manifold)
?
actual
possible
Trick 1 Find a set of points which represent
where the actual input data is
20-80 anchor points Ai
?
(Found using k-means, k-medoids, etc. This is
quantization, but only for feedback)
14- Clue 2 The actual input data is not
distributed evenly about those anchor-points
yes
no
Trick 2 Calculate covariance matrix Ci of data
around Ai
?
data
Eigenvectors of Ci
15Clue 3 S(x) is locally linear about each
anchor point
?
Trick 3 Construct linear (affine)
Taylor-series mappings SLi approximating S(x)
about each Ai (NB this doesnt require
polynomial SFA, just differentiable)
16Good news Linear SLi can be pseudo-inverted
(SVD) Bad news We dont want any old (x1,x2),
we want (x1,x2) on the data manifold
- Clue 4 Covariance eigenvectors tell us about
the local data manifold
- Trick 4
- Get SVD pseudo-inverse DX SLi-1(Snew S(Ai))
- Then stretch DX onto manifold by multiplying by
chopped Ci
Snew
DS
Stretched DX
S(Ai)
DX
DX
stretch
Projection matrix, keeping only as many
eigenvectors as dimensions of S
17- Good news Given Ai and Ci, we can invert Snew
? Xnew
Bad news How do we choose which Ai and SLi-1
to use?
?
?
These three all have the same value of Snew
?
18Clue 5 a) We need an anchor Ai such that
S(Ai) is close to Snew
Snew
Close candidates
S(Ai)
b) Need a hint of which anchors are close in
X-space
Hint region
- Trick 5 Choose anchor Ai such that
- Ai is close to the hint AND
- S(Ai) is close to Snew
19- All tricks together
- Map local linear inverse about each anchor point
S(Ai) neighbors x
Anchors
20- Clue 6 The local data scatter can decide if a
given point is probable (on the manifold) or not
improbable
probable
Trick 6 Use Gaussian hyper-ellipsoid
probabilities about closest Ai (this can
tell if a prediction makes sense or not)
improbable
probable
21- Estimated uncertainty increases away from anchor
points
-log(P)
22Summary of SFA inverse/prediction method
- We have X(t-2), X(t-1), X(t) we want X(t1)
S
- Calculate slow features
- S(t-2), S(t-1), S(t)
t
2. Extrapolate that trend linearly to Snew (NB
S varies slowly/smoothly in time)
Snew
S
t
3. Find candidate S(Ai)s close to Snew
Snew
all S(Ai)
e.g. candidate i 1, 16, 3, 7
23Summary contd
- 4. Take X(t) as hint, and find candidate Ais
close to it
e.g. candidate i 8, 3, 5, 17
5. Find best candidate Ai , whose index is
high on both candidate lists
S(Ai)s close to Snew Ai close to X(t)
i i
1 8
16 3
3 5
6 17
246. Use chosen Ai and pseudo-inverse (i.e.
SLi-1(Snew S(Ai) ) with SVD) to get DX
S(Ai)
DX
7. Stretch DX onto low-dim manifold using
chopped Ci
Stretched DX
DX
stretch
8. Add stretched DX back onto Ai to get final
prediction
Ai
Stretched DX
259. Use covariance hyper-ellipsoids to estimate
confidence in this prediction
- This method uses virtually everything we know
about the data any improvements presumably would
need further clues - Discrete sub-manifolds
- Discrete sequence steps
- Better nonlinear mappings
26Next steps
- Online learning
- Adjust anchor points and covariance as new data
arrive - Use weighted k-medoid clusters to mix in old with
new data - Hierarchy
- Set output of one layer as input to next
- Enforce ever-slower features up the hierarchy
- Test with more complex stimuli and natural movies
- Let feedback from above modify slow feature
polynomials - Find slow features in the unpredicted input
(input prediction)