Learning to make specific predictions using Slow Feature Analysis - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Learning to make specific predictions using Slow Feature Analysis

Description:

... specific predictions using Slow Feature Analysis ... module design: Slow Feature Analysis (SFA) BUT... ....No ... Let feedback from above modify slow ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 27
Provided by: bills84
Category:

less

Transcript and Presenter's Notes

Title: Learning to make specific predictions using Slow Feature Analysis


1
Learning to make specific predictions using Slow
Feature Analysis
2
Memory/prediction hierarchy with temporal
invariances
Slow temporally invariant abstractions
Fast quickly changing input
But how does each module work learn, map, and
predict?
3
  • My (old) module
  • Quantize high-dim input space
  • Map to low-dim output space
  • Discover temporal sequences in input space
  • Map sequences to low-dim sequence language
  • Feedback same map run backwards
  • Problems
  • Sequence-mapping (step 4) depends on several
    previous
  • steps ? brittle, not robust
  • Sequence-mapping not well-defined statistically

4
New module design Slow Feature Analysis (SFA)
  • Pros of SFA
  • Nearly guaranteed to find some slow features
  • No quantization
  • Defined over entire input space
  • Hierarchical stacking is easy
  • Statistically robust building blocks (simple
    polynomials, Principal Components Analysis,
    variance reduction, etc)
  • ? a great way to find invariant functions
  • ? invariants change slowly, hence easily
    predictable

5
  • BUT
  • .No feedback!
  • Cant get specific output from invariant input
  • Its hard to take a low-dim signal and turn it
    into the right high-dim one (underdetermined)
  • Heres my solution (straightforward, probably
    done before somewhere)
  • Do feedback with separate map

6
First, show it working then, show how why
Input space 20-dim retina Input shapes
Gaussian blurs (wrapped) of 3 different widths
Input sequences constant-velocity motion (0.3
pixels/step)
T 0 T2 T4
Pixel 21 pixel 1
T 23 T25 T27
7
Sanity-check slow features extracted match
generating parameters
Gaussian std dev.
What
Slow feature 1
Where
Gaussian center posn
Slow feature 2
( so far, this is plain vanilla SFA, nothing
new)
8
New contribution Predict all pixels of next
image, given previous images
T 0 T2 T4 T5 ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Reference prediction is to use previous image
(tomorrows weather is just like todays)
T4 T5 ?
9
Plot ratio
(mean-squared prediction error ) (mean-squared
reference error)
Reference prediction

Median ratio over all points 0.06 (including
discontinuities)
over high-confidence points 0.03 (toss worst
20)
10
  • Take-home messages
  • SFA can be inverted
  • SFA can be used to make specific predictions
  • The prediction works very well
  • The prediction can be further improved by using
  • confidence estimates
  • So why is it hard, and how is it done?....

11
  • Why its hard

Low-dim slow features S1 0.3 x1 0.1 x12
1.4 x2 x3 1.1 x42 . 0.5 x5 x9
easy
High-dim x1 x2 x3 ....x2
0
But given S1 1.4 S2 -0.33 x1
? x2? x3? x4? x5? x6? . . . x20?
HARD
  • Infinitely many possibilities of xs
  • Vastly under-determined
  • No simple polynomial-inverse formula (e.g.
    quadratic formula)

12
  • Very simple, graphable example
  • (x1, x2) 2-dim ? S1 1-dim

S1(t) x12 x22 nearly constant, i.e. slow
x1(t), x2(t) approx circular motion in plane
Illustrate a series of six clue/trick pairs for
learning specific-prediction mapping
13
  • Clue 1 The actual input data is a small
    subset of all possible input data (i.e. on a
    manifold)

?
actual
possible
Trick 1 Find a set of points which represent
where the actual input data is
20-80 anchor points Ai
?
(Found using k-means, k-medoids, etc. This is
quantization, but only for feedback)
14
  • Clue 2 The actual input data is not
    distributed evenly about those anchor-points

yes
no
Trick 2 Calculate covariance matrix Ci of data
around Ai
?
data
Eigenvectors of Ci
15
Clue 3 S(x) is locally linear about each
anchor point
?
Trick 3 Construct linear (affine)
Taylor-series mappings SLi approximating S(x)
about each Ai (NB this doesnt require
polynomial SFA, just differentiable)
16
Good news Linear SLi can be pseudo-inverted
(SVD) Bad news We dont want any old (x1,x2),
we want (x1,x2) on the data manifold
  • Clue 4 Covariance eigenvectors tell us about
    the local data manifold
  • Trick 4
  • Get SVD pseudo-inverse DX SLi-1(Snew S(Ai))
  • Then stretch DX onto manifold by multiplying by
    chopped Ci

Snew
DS
Stretched DX
S(Ai)
DX
DX
stretch
Projection matrix, keeping only as many
eigenvectors as dimensions of S
17
  • Good news Given Ai and Ci, we can invert Snew
    ? Xnew

Bad news How do we choose which Ai and SLi-1
to use?
?
?
These three all have the same value of Snew
?
18
Clue 5 a) We need an anchor Ai such that
S(Ai) is close to Snew
Snew
Close candidates
S(Ai)
b) Need a hint of which anchors are close in
X-space
Hint region
  • Trick 5 Choose anchor Ai such that
  • Ai is close to the hint AND
  • S(Ai) is close to Snew

19
  • All tricks together
  • Map local linear inverse about each anchor point

S(Ai) neighbors x
Anchors
20
  • Clue 6 The local data scatter can decide if a
    given point is probable (on the manifold) or not

improbable
probable
Trick 6 Use Gaussian hyper-ellipsoid
probabilities about closest Ai (this can
tell if a prediction makes sense or not)
improbable
probable
21
  • Estimated uncertainty increases away from anchor
    points

-log(P)
22
Summary of SFA inverse/prediction method
  • We have X(t-2), X(t-1), X(t) we want X(t1)

S
  • Calculate slow features
  • S(t-2), S(t-1), S(t)

t
2. Extrapolate that trend linearly to Snew (NB
S varies slowly/smoothly in time)
Snew
S
t
3. Find candidate S(Ai)s close to Snew
Snew
all S(Ai)
e.g. candidate i 1, 16, 3, 7
23
Summary contd
  • 4. Take X(t) as hint, and find candidate Ais
    close to it

e.g. candidate i 8, 3, 5, 17
5. Find best candidate Ai , whose index is
high on both candidate lists
S(Ai)s close to Snew Ai close to X(t)
i i
1 8
16 3
3 5
6 17
24
6. Use chosen Ai and pseudo-inverse (i.e.
SLi-1(Snew S(Ai) ) with SVD) to get DX
S(Ai)
DX
7. Stretch DX onto low-dim manifold using
chopped Ci
Stretched DX
DX
stretch
8. Add stretched DX back onto Ai to get final
prediction
Ai
Stretched DX
25
9. Use covariance hyper-ellipsoids to estimate
confidence in this prediction
  • This method uses virtually everything we know
    about the data any improvements presumably would
    need further clues
  • Discrete sub-manifolds
  • Discrete sequence steps
  • Better nonlinear mappings

26
Next steps
  • Online learning
  • Adjust anchor points and covariance as new data
    arrive
  • Use weighted k-medoid clusters to mix in old with
    new data
  • Hierarchy
  • Set output of one layer as input to next
  • Enforce ever-slower features up the hierarchy
  • Test with more complex stimuli and natural movies
  • Let feedback from above modify slow feature
    polynomials
  • Find slow features in the unpredicted input
    (input prediction)
Write a Comment
User Comments (0)
About PowerShow.com