Distributed Planning in Hierarchical Factored MDPs

1 / 23
About This Presentation
Title:

Distributed Planning in Hierarchical Factored MDPs

Description:

Speed control. S. External. variables. Actions. Subsystem j decomposed: ... Well-designed hi exponentially fewer parameters. Approximate Linear Programming ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 24
Provided by: carlosg94

less

Transcript and Presenter's Notes

Title: Distributed Planning in Hierarchical Factored MDPs


1
Distributed Planning in Hierarchical Factored MDPs
  • Carlos Guestrin
  • Stanford University
  • Geoffrey Gordon
  • Carnegie Mellon University

2
Multiagent Coordination Examples
  • Search and rescue
  • Factory management
  • Supply chain
  • Firefighting
  • Network routing
  • Air traffic control
  • Access only local information
  • Distributed Control
  • Distributed Planning

3
Hierarchical Decomposition
Part-of
Part-of
Cylinders
Chassis
Injection
Engine
Steering
Exhaust
  • Subsystems can share variables
  • Each subsystem only observes its local variables
  • Parallel decomposition ! exponential state space

4
Outline
  • Object-based Representation
  • Hierarchical Factored MDPs
  • Distributed planning
  • Message passing algorithm based on LP
    decomposition
  • Hierarchical action selection mechanism
  • Limited observability and communication
  • Reusing plans and computation
  • Exploit classes of objects

5
Basic Subsystem MDP
  • Subsystem j decomposed
  • Internal variables Xj
  • External variables Yj
  • Actions Aj
  • Subsystem model
  • Rewards - Rj(Xj , Yj , Aj)
  • Transitions - Pj (Xj Xj , Yj , Aj)
  • Subsystem can be modeled with any representation

Speed control
Actions
External variables
Internal variables
?
?
6
Hierarchical Subsystem Tree
  • Subsystem tree
  • Nodes are subsystems
  • Hierarchical decomposition
  • Tree reward sum subsystem rewards
  • Consistent subsystem tree
  • Running intersection property
  • Consistent dynamics
  • Lemma consistent subsystem tree yields
    well-defined global MDP

SepSetM2 G , ?
M2 Speed control
SepSetM3 ?
M3 Cooling
7
Relationship to Factored MDPs
Hierarchical Factored MDP
Multiagent Factored MDP Guestrin et al. 01
  • Representational power equivalent
  • Hierarchical factored MDP ? multiagent factored
    MDP with particular choice of basis functions
  • New capabilities
  • Fully distributed planning algorithm
  • Reuse for knowledge representation
  • Reuse of computation
  • MDP counterpart to Object-Oriented Bayes Nets
    (OOBNs) Koller and Pfeffer 97

8
Planning for Hierarchical Factored MDPs
  • Action space joint action a a1,, an for all
    subsystems
  • State space joint state x of entire system
  • Reward function total reward r
  • Action and state spaces are exponential in
    subsystems
  • Exploit hierarchical structure
  • Efficient, distributed approximate planning
    algorithm
  • Simple message passing approach
  • Each subsystem accesses only its local model
  • Each local model solved by any standard MDP
    algorithm

9
Solving MDPs as LPs
  • Bellman constraint if x ?a y with reward r,
  • V(x) ? V(y) r Q(a, x)
  • Similarly for stochastic transitions
  • Optimal V satisfies all Bellman constraints, and
    is componentwise smallest

min V(x)V(y)V(z)V(g) st V(x) ? V(y)1 V(y) ?
V(g)3 V(x) ? V(z)2 V(z) ? V(g)1
10
Decomposable Value Functions
Linear combination of restricted domain functions
Bellman et al. 63 Schweitzer Seidmann
85 Tsitsiklis Van Roy 96 Koller Parr
99,00 Guestrin et al. 01
  • Each hi is status of small part(s) of a complex
    system
  • Status of a machine and neighbors
  • Load on machine
  • Must find w giving good approximate value
    function
  • Well-designed hi ? exponentially fewer parameters

11
Approximate Linear Programming
  • To solve subsystem tree MDP as LP
  • Overall state is cross-product of subsystem
    states
  • Bellman LP has exponentially many constraints,
    variables
  • ? we need to approximate
  • Write V(x) V1(X1) V2(X2) ...
  • Minimize V1(X1) V2(X2) ... s.t.
  • V1(X1) V2(X2) ... ? V1(Y1) V2(Y2) ...
  • R1 R2 ...
  • One variable Vi(Xi) for each state of each subsys
    ?
  • One constraint for every state and action ?
  • Vi , Qi depend on small sets of variables/actions
    ?
  • Generates polynomially-sized LPs for factored
    MDPs Guestrin et al. 01

12
Overview of Algorithm
  • Each subsystem solves a local (stand-alone) MDP
  • Each subsystem computes messages by solving a
    simple local LP
  • Sends constraint message to its parent
  • Sends reward messages to its children
  • Repeat until convergence

Ml
Reward message
Constraint message


Mj
Reward message
Constraint message


Mk
13
Stand-alone MDPs and Reward Messages
Reward messages
Subsystem MDP
Stand-alone MDP
  • Sj from parent
  • Sk to children
  • State (Xj , Yj)
  • Actions Aj
  • Rewards Rj(Xj , Yj , Aj)
  • Transitions Pj (Xj Xj , Yj , Aj)
  • State Xj
  • Actions (Aj , Yj)
  • Rewards Rj(Xj , Yj , Aj) Sj ?k Sk
  • Transitions Pj (Xj Xj , Yj , Aj)
  • Reward messages are over SepSets
  • Solve stand-alone MDP using any algorithm
  • Obtain visitation frequencies of resulting
    policy
  • ?j discounted frequency of visits to each
    state-action

14
Visitation Frequencies
Dual
  • Discounted frequency of visits to each state
    action pairs
  • Subsystems must agree on the frequency for shared
    variables ! reward messages
  • Approx. ! relaxed enforcement of constraints

15
Overview of Algorithm Detailed
  • Each subsystem solves a local (stand-alone) MDP
  • Compute local visitation frequencies ?j
  • Add constraint to reward message LP
  • Each subsystem computes messages by solving a
    simple local LP
  • Sends constraint message to its parent
    visitation frequencies for SepSet variables
  • Sends reward messages to its children
  • Repeat until convergence

Ml


Mj


Mk
16
Reward Message LP
Dual
  • LP yields reward messages Sk for children
  • Dual yields mixing weights pj , pk ? enforce
    consistent frequencies

17
Computing Reward Messages
Rows of ?jj and Lj correspond to visitation
frequencies and value of each policy visited by
Mj
Rows of ?jk are frequencies marginalized to
SepSetMk
Messages
  • Dual of reward message LP generates mixed
    policies
  • pj and pk are mixing parameters, force parents
    and children to agree on visitation of SepSet

18
Convergence Result
In finite number of iterations, algorithm
produces best possible value function (ie, same
as centralized planner)
  • Planning algorithm is a special case of nested
    Benders decomposition
  • One Benders split for each internal node N of
    subsystem tree
  • One subproblem is N itself
  • Remaining subproblems are subtrees for Ns
    children (decompose these recursively)
  • Master prob is to determine reward messages
  • Result follows from correctness of Benders
    decomposition

19
Hierarchical Action Selection
  • Distributed planning obtains value function
  • Distributed message passing obtains action choice
    (policy)
  • Sends conditional value to its parent
  • Sends action choice to its children
  • Limited observability
  • Limited communication

Ml
Action choice
Value of conditional policy


Mj
Action choice
Value of conditional policy


Mk
20
Reusing Models and Computation
  • Classes of objects
  • Basic subsystems with same rewards and
    transitions
  • Reuse in knowledge representation
  • Library of subsystems
  • Reusing computation
  • Compute policy (visitation frequencies) for one
    subsystem, use it in all subsystems of the same
    class
  • Compute messages for one subtree, use them in all
    equivalent subtrees

21
Related Work
  • Serial decompositions
  • one subsystem active at a time
  • Kushner Chen 74 (rooms in a maze)
  • Dean Lin, IJCAI-95 (combines w/ abstraction)
  • hierarchical is similar (MAXQ, HAM, etc.)
  • Parallel decompositions
  • more expressive (exponentially larger state
    space)
  • Singh Cohn, NIPS-98 (enumerates states)
  • Meuleau et al., AAAI-98 (heuristic for resources)

22
Related Work
  • Dantzig-Wolfe or Benders decomposition
  • Dantzig 65
  • first used for MDPs in Kushner Chen 74
  • we are first to apply to parallel subsystems
  • Variable elimination
  • well-known from Bayes nets
  • Guestrin, Koller Parr NIPS-01

23
Summary Hierarchical Factored MDPs
  • Parallel decomposition ! Exponential state space
  • Efficient distributed planning algorithm
  • Solve local stand-alone MDPs with any algorithm
  • Reward sharing coordinate subsystem plans
  • Simple message passing algorithm computes rewards
  • Hierarchical action selection
  • Limited communication
  • Limited observability
  • Reuse for knowledge representation and
    computation
  • General approach for modeling and planning in
    large stochastic systems
Write a Comment
User Comments (0)