Predictive%20State%20Representation - PowerPoint PPT Presentation

About This Presentation
Title:

Predictive%20State%20Representation

Description:

Knowing the exact state of the system is mostly an unrealistic assumption. ... learning with network of interrelated predictions [Tanner and Sutton 2004] ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 45
Provided by: iroUmo
Category:

less

Transcript and Presenter's Notes

Title: Predictive%20State%20Representation


1
  • Predictive State Representation
  • Masoumeh Izadi
  • School of Computer Science
  • McGill University

UdeM-McGill Machine Learning Seminar
2
Outline
  • Predictive Representations
  • PSR model specifications
  • Learning PSR
  • Using PSR in Control Problem
  • Conclusion
  • Future Directions

3
Motivation
  • In a dynamical system
  • Knowing the exact state of the system is
    mostly an unrealistic assumption.
  • Real world tasks exhibit uncertainty
  • POMDPs maintain belief b(p(s0).p(sn)) over
    hidden variables si as the state.
  • Beliefs are not verifiable!
  • POMDPs are hard to learn and to solve.

4
Motivation
  • Potential alternatives
  • K-Markov Model
  • not general!
  • Predictive Representations

5
Predictive Representations
  • State representation is in terms of experience.
  • Status (state) is represented by predictions
    made from it.
  • Predictions represent cause and effect.
  • Predictions are testable, maintainable, and
    learnable.
  • No explicit notion of topological relationships.

6
Predictive State Representation
  • Test a sequence of action-observation pairs
  • Prediction for a test given a history
  • Sufficient statistics predictions for a set of
    core tests, Q

q a1o1...akok
p(qh)P(o1...ok h, a1...ak)
7
Core Tests
A set of tests Q is a core tests set if its
prediction forms a sufficient statistic for the
dynamical system.
p(Qh)p(q1h) ...p(qn h) For any test t
p(th)f_t (p(Qh))
8
Linear PSR Model
For any test q, there exists a projection vector
mq s.t p(qh) p(Qh)T mq
Given a new action-observation pair, ao, the
prediction vector for each qi ? Q is updated by
P(qi hao) p(aoqih) / p(aoh) p(Qh)T maoqi/
p(Qh)Tmao
9
PSR Model Parameters
  • The set of core tests Qq1.qn
  • Projection vectors for one step tests mao (for
    all ao pairs )
  • Projection vectors for one step extension of
    core tests maoqi (for all ao pairs )

10
Linear PSR vs. POMDP
A linear PSR representation can be more compact
than the POMDP representation. A POMDP with n
nominal states can represent a dynamical system
of dimensions n
11
POMDP Model
  • The model is an n-tuple S, A, ?, T, O, R
  • Sufficient statistics belief state (probability
    distribution over S)

S set of states A set of actions ? set of
observations T transition probability
distribution for each action O observation
probability distribution for each
action-observation R reward functions for each
action
12
Belief State
  • Posterior probability distribution over states

1
S3
b
b
a
1
o
1
0? b(s) ? 1 for all s?S and ?s?S b(s) 1
b(s) O(s,a,o)T(s,a,s) b(s)/Pr(o a,b)
13
Construct PSR from POMDP
Outcome function u (t) the predictions for test
t from all POMDP states. Definition A test
t is said to be independent of a set of tests T
if its outcome vector is linearly independent of
the predictions for tests in T.
14
State Prediction Matrix
  • Rank of the matrix determines the size of Q.
  • Core tests corresponds to linearly independent
    columns.
  • Entries are computed using the POMDP model.

all possible tests
t1 t2
tj
s1
s2
si
sn
u(tj)
15
Linearly Independent States
  • Definition A linearly dependent state of an MDP
    is a state for which any action transition
    function is a linear combination of the
    transition functions from other states.
  • Having the same dynamical structure is a special
    case of linear dependency.

16
Example
O1, O2
0.3
O1, O4
0.2
O3
O2
0.7
O3, O2
0.8
O4
Linear PSR needs only two tests to represent the
system e.g. ao1, ao4 can predict any other tests
17
State Space Compression
Theorem For any controlled dynamical system
linearly dependent states in the underlying MDP
more compact PSR than the corresponding
POMDP.
Reverse direction is not always the case due to
possible structure in the observations
18
Exploiting Structure
PSR exploits linear independence structure in
the dynamics of a system. PSR also exploits
regularities in dynamics. Lossless
compression needs invariance of state
representation in terms of values as well as
dynamics. Including reward as part of
observation makes linear PSR similar to linear
lossless compressions for POMDPs.
19
POMDP Example
States 20 (directions , grid state) Actions
3(turn left, turn right, move) Observations 2
(wall, nothing)
20
Structure Captured by PSR
Alias states (by immediate observation)
Predictive classes (by PSR core tests)
21
Generalization
  • Good generalization results when similar
    situations have similar representations.
  • A good generalization makes it possible to learn
    with small amount of experience.
  • Predictive representation
  • generalizes the state space well.
  • makes the problem simpler and yet
    precise.
  • assists reinforcement learning
    algorithms. Rafols et al 2005

22
Learning the PSR Model
  • The set of core tests Qq1.qQ
  • Projection vectors for one step tests mao (for
    all ao pairs )
  • Projection vectors for one step extension of
    core tests maoqi (for all ao pairs )

23
System Dynamics Vector
tia1o1akok
p(ti) prob(o1oka1ak)
t1 t2
ti
p(t1)
p(t2)
p(ti)
Prediction of all possible future events can be
generated having any precise model of the system.
24
System Dynamics Matrix
tja1o1akok
hia1o1anon
tj
t1 t2
h1 e
h2
hi
P(tjhi)
p(tjhi) prob ( on1 o1,, onk
oka1o1anon , a1ak)
Linear dimension of a dynamical system is
determined by the rank of the system dynamics
matrix.
25
POMDP in System Dynamics Matrix
  • Any model must be able to generate System Dynamic
    Matrix.
  • Core beliefs B b1 b2 qN
  • Span the reachable subspace of continuous belief
    space
  • Can be beneficial in POMDP solution methods
    Izadi et al 2005
  • Represent reduced state space dimensions in
    structured domains

tj
t1 t2
b1
b2
bi
P(tjbi)
26
Core Test Discovery
  • Extend tests and histories one-step and estimate
    entries of Z (counting data samples).
  • Find the rank and keep the linearly independent
    tests and histories
  • Keep extending until the rank doesnt change

Tests (T)
Zij P(tjhi)
Histories (H)
27
System Dynamics Matrix
All possible extension of tests and histories
needs processing a huge matrix in large domains.
tj
t1 t2
h1 e
h2
hi
P(tjhi)
28
Core Test Discovery
t1 t2
h1
One-step histories/ tests
h2
  • millions of samples required for a few state
    problem.

Repeat one-step extensions to Qi till the rank
doesnt change
29
PSR Learning
  • Structure Learning
  • which tests to choose for Q from data
  • Parameter Learning
  • how to tune m-vectors given the structure
  • and experience data

30
Learning Parameters
  • PSR
  • Gradient algorithm Singh et al. 2003
  • Principle-Component based algorithm for TPSR
    (uncontrolled system) Rosencrantz et al. 20004
  • Suffix-History Algorithm James et al.2004
  • POMDP
  • EM

31
Results on PSR Model Learning
32
Planning
  • States expressed in predictive form.
  • Planning and reasoning should be in terms of
    experience.
  • Rewards treated as part of observations.
  • Tests are of the form ta1(o1r1).an(onrn).
  • General POMDP methods (e.g. dynamic programming)
    can be used.

33
Predictive Space
P(Qh)
P(Qhao)
Q3
1
o
1
0 P(qi ) 1 for all is
1
P(qi hao) p(Qh)T maoqi /p(Qh)Tmao
34
Forward Search
Compare alternative future experiences.
a1
a2
o1
o2
o1
o2
o1
o1
o2
o1
o2
o2
a1
a2
Exponential Complexity
35
DP for Finite-Horizon POMDPs
a1
a2
o1
o2
o1
o2
p1
p2
a2
a3
a3
a3
o1
o1
o2
o1
o2
o2
o1
o2
p1
p2,
a3
a2
a1
a1
a3
a1
a2
a1
p3,
a1
s1,
s2,
o1
o2
p3
a1
a2
o1
o2
o1
o2
The value function for a set of trees is always
piecewise linear and convex (PWLC)
a2
a2
a3
a3
36
Value Iteration in POMDPs
  • Value iteration
  • Initialize value function
  • V(b) max_a S_s R(s,a) b(s)
  • This produces 1 alpha-vector per action.
  • Compute the value function at the next iteration
    using Bellmans equation
  • V(b) max_a S_s R(s,a)b(s)?S_sT(s,a,s)O(s,a,
    z)a(s)

37
DP for Finite-Horizon PSRs
Theres a scaler reward for each test.
R(ht,a) S_r prob (r ht , a) Value of
a policy tree is a linear function of prediction
vector.
Vp(p(Qh)PT(Qh)( n_a ?S_o Mao w)
Theorem value function for a finite horizon is
still piecewise-linear and convex.
38
Value Iteration in PSRs
  • Value iteration just as in POMDPs
  • V(p(Qh)) max _a Va(p(Qh))
  • Represent any finite-horizon solution by a
    finite set of alpha-vectors (policy trees).

39
Results on PSR Control
James etal.2004
40
Results on PSR Control
  • Current PSR planning algorithms are not
    advantageous to POMDP planning (Izadi Precup
    2003, James et al. 2004).
  • Planning Requires precise definition of
    predictive space.
  • It is important to analyze the impact of PSR
    planning on structured domains.

41
Predictive Representations
  • Linear PSR
  • EPSR action sequence last observation Rudary
    and Singh 2004
  • mPSR augmented with history James et al 2005
  • TD Networks temporal difference learning with
    network of interrelated predictions Tanner and
    Sutton 2004

42
Summary
  • A good state representation should be
  • compact
  • useful for planning
  • efficiently learnable
  • Predictive state representation provide a
    lossless compression which reflects the
    underlying structure.
  • PSR generalizes the space and facilitate
    planning.

43
Limitations
  • Learning and Discovery in PSRs still lack
    efficient algorithms.
  • Current algorithms need way too data samples.
  • Experiments on many ideas can only be done on
    toy problems so far due to model learning
    limitation.

44
Future Work
  • Theory of PSR and possible extensions
  • Efficient algorithms for learning predictive
    models
  • More on combining temporal abstraction with PSR
  • More on planning algorithms for PSR and EPSR
  • Approximation methods are yet to be developed
  • PSR for continuous systems
  • Generalization across states in stochastic
    systems
  • Non linear PSRs and exponential compression(?)
Write a Comment
User Comments (0)
About PowerShow.com