Structured Representations for POMDPs - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Structured Representations for POMDPs

Description:

Flat States, Actions, Observations. Structured. States State variables ... [Guestrin, Koller and Parr, 2001] Problem a vectors become exponential in size ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 23

Provided by: guys3

Category:

more less

Transcript and Presenter's Notes

Title: Structured Representations for POMDPs

1
Structured Representations for POMDPs

Guy Shani
Machine Learning and Applied Statistics
Microsoft Research

2
Structured vs. Flat

Flat States, Actions, Observations
Structured
States ? State variables
Actions ? Action variables
Observations ? Observation variables
State variables - X X1,,Xn
State - s ltx1,, xngt

3
System Dynamics as DBNsBoutilier and Poole,
1996

Dynamic Bayesian Networks 2-layered, model
dynamic changes
Nodes Variables
Edges dependency
CPT conditional probability table

DBN for transition given action a
X
X
1
1
2
2
Pr(X1TX1T,X3F,a)0.2 Pr(X1FX1T,X3F,a)0.
8
3
3
4
4
4
Example Rock SampleSmith and Simmons, 2004
Action Sample rock i
t
t1
X
X
Y
Y
Goal move to interesting rocks and sample them.
Ri
R'i
5
CPTs as Decision Diagrams

Decision Diagrams
Inner nodes variables
Edges values (left False, right True)
Leaves hold values
Algebraic Decision Diagrams (ADD)
Nodes with identical children are removed
Context specific independence

CPT
ADD
Decision Diagram
X1
X1
X3
X3
.5
X3
.9
.2
.5
.9
.2
.5
6
ADD OperationsBryant, 1986

Product
Sum
Inner product
Variable elimination
Replacing each Xi by the sum of its children
Translation
Replacing each occurrence of X by Y
Assuming that Y did not appear in the original
ADD
Reduce reduces an ADD to its minimal form
The order of variables is important
All operations are implemented using traversals
over the ADDS
Execution is enhanced by caching visited paths

7
System Dynamics in Factored Form

tr(s,a,s) tr(ltx1,,xngt,a,ltx1,,xngt)
O(a,s,o) O(a, ,ltx1,,xngt,o)
Pa,o- Complete Action-Observation Diagram
Hansen and Feng, 2000
Can be computed by joining together CPTs (no
products)
Problem - Resulting ADD might be large

8
Value Iteration

Beliefs as ADDs
a-vectors as ADDs
Point-based backup -
ADDs
Belief update
ADDs
Need to normalize using pr(ob,a)

9
Compressing ADDsHansen and Feng, 2001

ADD size influenced by distinct values
ADDs can be compressed by joining similar values
a-vector Join values that are e-close
Beliefs after joining values we must normalize
Never join zero and non-zero values

Compress 0.1 differences
X1
X1
X1
Reduce
X3
X3
.5
X3
X3
X3
.5
.9
.2
.9
.2
.6
.5
.9
.2
.5
10
Relevant VariablesShani et al. 2008

Some variables do not influence transitions or
observations pr(xixi ,a) 1.0
A variable is relevant if it affects the
transition or observation given an action.
The complete action-observation diagram can
specify only relevant variables
Advantage complete action diagrams become
smaller
Exact method no approximations

11
Relevant Variables
Action Sample rock 0
t
t1
X
X
Y
Y
R0
R0
Goal sample all good rocks Actions Move
(north, south, east, west) Check (long range
sensor) Sample (drill into rock)
R1
R1
R2
R2
12
Relevant Variables Results

Relevant variables and variable orders over the
RockSample domain.

13
Example Network Administration
Given computers connected in a network
M0
M0
M0
M1
M1
M1
M3
M2
M2
M2
Goal reduce downtime Actions Ping a
machine Restart a machine No-op
M3
M3
14
Example Network Administration
t
t1
t2
M0
M0
M0
M1
M1
M1
No effect locality! After a few time steps
everything is influenced by everything. Relevant
variables trick does not hold.
M2
M2
M2
M3
M4
M4
M5
M5
M5
M6
M6
M6
15
Beliefs as Product of MarginalsBoyen and
Koller, 1998,Poupart, 2005

Intuition separate variables with low
correlations
Replace a single belief ADD with a set of ADDs
over disjoint sets of variables (components)
The belief over all variables is the product of
the components ADDs
Exact if components are independent.

16
Beliefs as Products of Marginals
M0
M1
M2
M3
M4
Values
M0
M3
M4
M1
M2
17
Beliefs as Product of Marginals

Straight forward solution
First compute the complete belief ADD
Then eliminate variables
Advantage - Products are computed only once
Variable elimination
Eliminate variables after each ADD product
Keeps intermediate ADDs small
Runtime depends on ADD size
Need heuristics to order the products and
eliminations
Disadvantage - Products are recomputed repeatedly

18
Experiments
19
Basis FunctionsGuestrin, Koller and Parr, 2001

Problem a vectors become exponential in size
Idea restrict a vectors to linear combinations
of basis functions
Basis function a fixed function (a vector) over
a subset of the state variables.
Reduction to basis functions can be done using LP
Can we compute the reduction without explicitly
computing the complete function first?
As we do for the belief marginals.

20
Relational POMDPsWang, 2007