Title: Structured Representations for POMDPs
1Structured Representations for POMDPs
- Guy Shani
- Machine Learning and Applied Statistics
- Microsoft Research
2Structured vs. Flat
- Flat States, Actions, Observations
- Structured
- States ? State variables
- Actions ? Action variables
- Observations ? Observation variables
- State variables - X X1,,Xn
- State - s ltx1,, xngt
3System Dynamics as DBNsBoutilier and Poole,
1996
- Dynamic Bayesian Networks 2-layered, model
dynamic changes - Nodes Variables
- Edges dependency
- CPT conditional probability table
DBN for transition given action a
X
X
1
1
2
2
Pr(X1TX1T,X3F,a)0.2 Pr(X1FX1T,X3F,a)0.
8
3
3
4
4
4Example Rock SampleSmith and Simmons, 2004
Action Sample rock i
t
t1
X
X
Y
Y
Goal move to interesting rocks and sample them.
Ri
R'i
5CPTs as Decision Diagrams
- Decision Diagrams
- Inner nodes variables
- Edges values (left False, right True)
- Leaves hold values
- Algebraic Decision Diagrams (ADD)
- Nodes with identical children are removed
- Context specific independence
CPT
ADD
Decision Diagram
X1
X1
X3
X3
.5
X3
.9
.2
.5
.9
.2
.5
6ADD OperationsBryant, 1986
- Product
- Sum
- Inner product
- Variable elimination
- Replacing each Xi by the sum of its children
- Translation
- Replacing each occurrence of X by Y
- Assuming that Y did not appear in the original
ADD - Reduce reduces an ADD to its minimal form
- The order of variables is important
- All operations are implemented using traversals
over the ADDS - Execution is enhanced by caching visited paths
7System Dynamics in Factored Form
- tr(s,a,s) tr(ltx1,,xngt,a,ltx1,,xngt)
- O(a,s,o) O(a, ,ltx1,,xngt,o)
- Pa,o- Complete Action-Observation Diagram
- Hansen and Feng, 2000
- Can be computed by joining together CPTs (no
products) - Problem - Resulting ADD might be large
8Value Iteration
- Beliefs as ADDs
- a-vectors as ADDs
- Point-based backup -
- ADDs
- Belief update
- ADDs
- Need to normalize using pr(ob,a)
9Compressing ADDsHansen and Feng, 2001
- ADD size influenced by distinct values
- ADDs can be compressed by joining similar values
- a-vector Join values that are e-close
- Beliefs after joining values we must normalize
- Never join zero and non-zero values
Compress 0.1 differences
X1
X1
X1
Reduce
X3
X3
.5
X3
X3
X3
.5
.9
.2
.9
.2
.6
.5
.9
.2
.5
10Relevant VariablesShani et al. 2008
- Some variables do not influence transitions or
observations pr(xixi ,a) 1.0 - A variable is relevant if it affects the
transition or observation given an action. - The complete action-observation diagram can
specify only relevant variables - Advantage complete action diagrams become
smaller - Exact method no approximations
11Relevant Variables
Action Sample rock 0
t
t1
X
X
Y
Y
R0
R0
Goal sample all good rocks Actions Move
(north, south, east, west) Check (long range
sensor) Sample (drill into rock)
R1
R1
R2
R2
12Relevant Variables Results
- Relevant variables and variable orders over the
RockSample domain.
13Example Network Administration
Given computers connected in a network
M0
M0
M0
M1
M1
M1
M3
M2
M2
M2
Goal reduce downtime Actions Ping a
machine Restart a machine No-op
M3
M3
14Example Network Administration
t
t1
t2
M0
M0
M0
M1
M1
M1
No effect locality! After a few time steps
everything is influenced by everything. Relevant
variables trick does not hold.
M2
M2
M2
M3
M4
M4
M5
M5
M5
M6
M6
M6
15Beliefs as Product of MarginalsBoyen and
Koller, 1998,Poupart, 2005
- Intuition separate variables with low
correlations - Replace a single belief ADD with a set of ADDs
over disjoint sets of variables (components) - The belief over all variables is the product of
the components ADDs - Exact if components are independent.
16Beliefs as Products of Marginals
M0
M1
M2
M3
M4
Values
M0
M3
M4
M1
M2
17Beliefs as Product of Marginals
- Straight forward solution
- First compute the complete belief ADD
- Then eliminate variables
- Advantage - Products are computed only once
- Variable elimination
- Eliminate variables after each ADD product
- Keeps intermediate ADDs small
- Runtime depends on ADD size
- Need heuristics to order the products and
eliminations - Disadvantage - Products are recomputed repeatedly
18Experiments
19Basis FunctionsGuestrin, Koller and Parr, 2001
- Problem a vectors become exponential in size
- Idea restrict a vectors to linear combinations
of basis functions - Basis function a fixed function (a vector) over
a subset of the state variables. - Reduction to basis functions can be done using LP
- Can we compute the reduction without explicitly
computing the complete function first? - As we do for the belief marginals.
20Relational POMDPsWang, 2007
- Captures identical dynamics over objects
- Move(A,B)
- Pre Clear(B),Clear(A),-On(A,B)
- Effect
- 0.7 On(A,B)
- 0.3 On(A,B)
- Stronger structure than regular factored POMDP
- FODD First Order Decision Diagram
- ADD over propositions
21Other Structures
- Exploiting different types of structure
- Hierarchical a hierarchy of POMDPs
- Pineau, 2002, Hansen, 2003, Foka et al.
2007 - Value-directed compression exploit structure in
the value function - Poupart, 2003
- Belief compression exploiting structure in
reachable beliefs. - Roy et al., 2004, Pineau et al., 2003
22Summary
- Flat methods got us far
- 10 states at 1998
- 200,000 states at 2008
- Factored methods got us to the next step
- 20,000,000 states
- We need to exploit more structure in order to
scale up - Much research is needed