MultiAgent Planning in Complex Uncertain Environments - PowerPoint PPT Presentation

About This Presentation
Title:

MultiAgent Planning in Complex Uncertain Environments

Description:

Ronald Parr (Duke) 2004 Carlos Guestrin, Daphne Koller. Collaborative Multiagent Planning ... [Guestrin, K., Parr '01] Q sub-functions assigned to relevant agents ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 74
Provided by: btas4
Category:

less

Transcript and Presenter's Notes

Title: MultiAgent Planning in Complex Uncertain Environments


1
Multi-Agent Planning in Complex Uncertain
Environments
  • Daphne Koller
  • Stanford University

Joint work with Carlos Guestrin (CMU) Ronald
Parr (Duke)
2
Collaborative Multiagent Planning
Collaborative Multiagent Planning
Long-term goals
Multiple agents
Coordinated decisions
  • Search and rescue, firefighting
  • Factory management
  • Multi-robot tasks (Robosoccer)
  • Network routing
  • Air traffic control
  • Computer game playing

3
Joint Planning Space
  • Joint action space
  • Each agent i takes action ai at each step
  • Joint action a a1,, an for all agents
  • Joint state space
  • Assignment x1,,xn to some set of variables
    X1,,Xn
  • Joint state x x1,, xn of entire system
  • Joint system Payoffs and state dynamics depend
    on joint state and joint action
  • Cooperative agents Want to maximize total payoff

4
Exploiting Structure
  • Real-world problems have
  • Hundreds of objects
  • Googles of states
  • Real-world problems have structure!

Approach Exploit structured representation to
obtain efficient approximate solution
5
Outline
  • Action Coordination
  • Factored Value Functions
  • Coordination Graphs
  • Context-Specific Coordination
  • Joint Planning
  • Multi-Agent Markov Decision Processes
  • Efficient Linear Programming Solution
  • Decentralized Market-Based Solution
  • Generalizing to New Environments
  • Relational MDPs
  • Generalizing Value Functions

6
One-Shot Optimization Task
  • Q-function Q(x,a) encodes agents payoff for
    joint action a in joint state x
  • Agents task To compute
  • actions is exponential ?
  • Complete state observability ?
  • Full agent communication ?

7
Factored Payoff Function
  • Approximate Q function as sum of Q sub-functions
  • Each sub-function depends on local part of system
  • Two interacting agents
  • Agent and important resource
  • Two inter-dependent pieces of machinery

Q(A1,,A4, X1,,X4)
¼
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)


Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)

K. Parr 99,00 Guestrin, K., Parr 01
8
Distributed Q Function
  • Q sub-functions assigned to relevant agents

Q(A1,,A4, X1,,X4)
¼
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)


Q4(A3, A4, X3,X4)
Q3(A2, A3, X2,X3)

Guestrin, K., Parr 01
9
Multiagent Action Selection
Instantiate current state x
Maximal action argmaxa
Distributed Q function
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)
Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)
10
Instantiating State x
Limited observability ? agent i only observes
variables in Qi
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)
Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)
11
Choosing Action at State x
Instantiate current state x
Q2(A1, A2, X1,X2)
Q2(A1, A2)
Q1(A1, A4, X1,X4)
Q1(A1, A4)
Q3(A2, A3, X2,X3)
Q3(A2, A3)
Q4(A3, A4, X3,X4)
Q4(A3, A4)
12
Variable Elimination



maxa
  • Use variable elimination for maximization

Q2(A1, A2)
Q1(A1, A4)
Q3(A2, A3)
  • Limited communication ? for optimal action choice
  • Comm. bandwidth tree-width of coord. graph

Q4(A3, A4)
13
Choosing Action at State x
14
Choosing Action at State x
Q2(A1, A2)
max
g1(A2, A4)
Q1(A1, A4)
A3
Q3(A2, A3)



Q4(A3, A4)
15
Coordination Graphs
  • Communication follows triangulated graph
  • Computation grows exponentially in tree width
  • Graph-theoretic measure of connectedness
  • Arises in BNs, CSPs,
  • Cost exponential in worst case,
  • fairly low for many real graphs

A11
A10
A5
A1
A2
16
Context-Specific Interactions
  • Payoff structure can vary by context
  • Agents A1, A2 both trying to pass through same
    narrow corridor
  • Can use context-specific value rules
  • ltAt(X,A1), At(X,A2),
  • A1 fwd ? A2 fwd -100gt
  • Hope Context-specific payoffs will induce
    context-specific coordination

17
Context-Specific Coordination
A6
A5
A1
A2
A4
A3
Instantiate current state x true
18
Context-Specific Coordination
A6
A5
Coordination structure varies based on context
A1
A2
A4
A3
19
Context-Specific Coordination
A6
A5
Coordination structure varies based on
communication
A1
A2
A4
A3
Maximizing out A1
Rule-based variable elimination Zhang Poole
99
20
Context-Specific Coordination
A6
A5
Coordination structure varies based on agent
decisions
A1
A2
A4
A3
Eliminate A1 from the graph
Rule-based variable elimination Zhang Poole
99
21
Robot Soccer
Kok, Vlassis Groen University of Amsterdam
  • UvA Trilearn 2002 won German Open 2002, but
    placed fourth in Robocup-2002.
  • the improvements introduced in UvA Trilearn
    2003 include an extension of the intercept
    skill, improved passing behavior and especially
    the usage of coordination graphs to specify the
    coordination requirements between the different
    agents.

22
RoboSoccer Value Rules
  • Coordination graph rules include conditions on
    player role and aspects of global system state
  • Example rules for player i, in role of passer

Depends on distance of j to goal after move
23
UvA Trilearn 2003 Results
  • UvA Trilearn won
  • German Open 2003
  • US Open 2003
  • RoboCup 2003
  • German Open 2004


24
Outline
  • Action Coordination
  • Factored Value Functions
  • Coordination Graphs
  • Context-Specific Coordination
  • Joint Planning
  • Multi-Agent Markov Decision Processes
  • Efficient Linear Programming Solution
  • Decentralized Market-Based Solution
  • Generalizing to New Environments
  • Relational MDPs
  • Generalizing Value Functions

25
peasant
footman
  • Real-time Strategy Game
  • Peasants collect resources and build
  • Footmen attack enemies
  • Buildings train peasants and footmen

building
26
Planning Over Time
Markov Decision Process (MDP) representation
  • Action space Joint agent actions a a1,, an
  • State space Joint state descriptions x x1,,
    xn
  • Momentary reward function R(x,a)
  • Probabilistic system dynamics P(xx,a)

27
Policy
At state x, action a for all agents
Policy ?(x) a
28
Value of Policy
Expected long-term reward starting from x
Value V?(x)
?(x0)
29
Optimal Long-term Plan
Optimal Q-function Q(x,a)
Optimal policy ?(x)
Bellman Equations
30
Solving an MDP
Solve Bellman equation
Optimal value V(x)
Optimal policy ?(x)
Many algorithms solve the Bellman equations
  • Policy iteration Howard 60, Bellman 57
  • Value iteration Bellman 57
  • Linear programming Manne 60

31
LP Solution to MDP
  • One variable V (x) for each state
  • One constraint for each state x and action a
  • Polynomial time solution

32
Are We Done?
  • Planning is polynomial in states and actions
  • states exponential in number of variables
  • actions exponential in number of agents

Efficient approximation by exploiting structure!
33
Structured Representation
Factored MDP
Boutilier et al. 95
P(FF,G,AB,AF)
  • State
  • Dynamics
  • Decisions
  • Rewards

Complexity of representation Exponential in
parents (worst case)
34
Structured Value function ?
Factored MDP ? Structure in V
Factored MDP Structure in V
Factored V often provides good approximate value
function
35
Structured Value Functions
Bellman et al. 63, Tsitsiklis Van Roy
96 K. Parr 99,00
  • Approximate V as a factored value function
  • In rule-based case
  • hi is a rule concerning small part of the system
  • wi is the value associated with the rule
  • Goal find w giving good approximation V to V

Factored value function V ? wi hi
Factored Q function Q ? Qi
Can use coordination graph
36
Approximate LP Solution
³
  • One variable wi for each basis function ?
  • Polynomial number of LP variables
  • One constraint for every state and action ?
  • Exponentially many LP constraints

37
So What Now?
Guestrin, K., Parr 01
Exponentially many linear one nonlinear
constraint
38
Variable Elimination Revisited
Guestrin, K., Parr 01
  • Use Variable Elimination to represent constraints

Exponentially fewer constraints
Polynomial LP for finding good factored
approximation to V
39
Network Management Problem
  • Computer runs processes
  • Computer status good, dead, faulty
  • Dead neighbors increase dying probability
  • Reward for successful processes
  • Each SysAdmin takes local action reboot, not
    reboot

Ring
Ring of Rings
Star
k-grid
40
Scaling of Factored LP
41
Multiagent Running Time
Ring of rings
Star pair basis
Star single basis
42
Strategic 2x2
Factored MDP model 2 Peasants, 2 Footmen, Enemy,
Gold, Wood, Barracks 1 million state/action pairs
Factored LP computes value function
Q
Coordination graph computes argmaxa Q(x,a)
World
43
Demo Strategic 2x2
Guestrin, Koller, Gearhart Kanodia
44
Limited Interaction MDPs
Guestrin Gordon, 02
  • Some MDPs have additional structure
  • Agents are largely autonomous
  • Interact in limited ways
  • e.g., competing for resources
  • Can decompose MDP as set of agent-based MDPs,
    with limited interface

45
Limited Interaction MDPs
Guestrin Gordon, 02
  • In such MDPs, our LP matrix is highly structured
  • Can use Dantzig-Wolfe LP decomposition to solve
    LP optimally, in a decentralized way
  • Gives rise to a market-like algorithm with
    multiple agents and a centralized auctioneer

46
Auction-style planning
Set pricing based on conflicts
Guestrin Gordon, 02
  • Each agent solves local (stand-alone) MDP
  • Agents send constraint messages to auctioneer
  • Must agree on policy for shared variables
  • Auctioneer sends pricing messages to agents
  • Pricing reflects penalties for constraint
    violations
  • Influences agents rewards in their MDP

47
Fuel Allocation Problem
UAV start
Target
  • UAVs share a pot of fuel
  • Targets have varying priority
  • Ignore target interference

Bererton, Gordon, Thrun Khosla
48
Fuel Allocation Problem
Bererton, Gordon, Thrun, Khosla , 03
49
High-Speed Robot Paintball
Bererton, Gordon Thrun
50
High-Speed Robot Paintball
Game variant 2
Game variant 1
Coordination point Sensor Placement
x start location goal location
51
High-Speed Robot Paintball
Bererton, Gordon Thrun
52
Outline
  • Action Coordination
  • Factored Value Functions
  • Coordination Graphs
  • Context-Specific Coordination
  • Joint Planning
  • Multi-Agent Markov Decision Processes
  • Efficient Linear Programming Solution
  • Decentralized Market-Based Solution
  • Generalizing to New Environments
  • Relational MDPs
  • Generalizing Value Functions

53
Generalizing to New Problems
Many problems are similar
Good solution to Problem n1
Solve Problem 1
Solve Problem n
Solve Problem 2
MDPs are different! ? Different sets of states,
action, reward, transition,
54
Generalizing with Relational MDPs
Similar domains have similar types of
objects ?
Relational MDP
Exploit similarities by computing generalizable
value functions
Generalization
Avoid need to replan Tackle larger problems
55
Relational Models and MDPs
Guestrin, K., Gearhart Kanodia 03
  • Classes
  • Peasant, Footman, Gold, Barracks, Enemy
  • Relations
  • Collects, Builds, Trains, Attacks
  • Instances
  • Peasant1, Peasant2, Footman1, Enemy1
  • Builds on Probabilistic Relational Models K.
    Pfeffer 98

56
Relational MDPs
Guestrin, K., Gearhart Kanodia 03
Enemy
Footman
  • Class-level transition probabilities depends on
  • Attributes Actions Attributes of related
    objects
  • Class-level reward function

Very compact representation! Does not depend on
of objects
57
World is a Large Factored MDP
Links between objects
Relational MDP
Factored MDP
of objects
  • Instantiation (world)
  • instances of each class
  • Links between instances
  • Well-defined factored MDP

58
MDP with 2 Footmen and 2 Enemies
59
World is a Large Factored MDP
Links between objects
Relational MDP
Factored MDP
of objects
  • Instantiate world
  • Well-defined factored MDP
  • Use factored LP for planning
  • We have gained nothing! ?

60
Class-level Value Functions
VF1(F1.H, E1.H)
VE1(E1.H)
VF2(F2.H, E2.H)
VE2(E2.H)
VF
VF
VE
VE
V?(F1.H, E1.H, F2.H, E2.H)
Units are Interchangeable!



VF1 ? VF2 ? VF
VE1 ? VE2 ? VE
At state x, each footman has different
contribution to V
Given wC can instantiate value function for any
world ?
61
Factored LP-based Generalization
How many samples?
Generalize
Class- level factored LP
VF
VE
Sample Set I
62
Sampling Complexity
Exponentially many worlds
need exponentially many samples?
?
objects in a world is unbounded
must sample very large worlds?
?
NO!
63
Theorem
Sample m small worlds of up to O( ln 1/? )
objects
m
Value function within O(?) of class-level value
function optimized for all worlds, with prob. at
least 1-?
Rcmax is the maximum class reward
64
Strategic 2x2
Relational MDP model
2 Peasants, 2 Footmen, Enemy, Gold, Wood,
Barracks 1 million state/action pairs
Factored LP computes value function
Q
Coordination graph computes argmaxa Q(x,a)
World
65
Strategic 9x3
Relational MDP model
9 Peasants, 3 Footmen, Enemy, Gold, Wood,
Barracks
3 trillion state/action pairs
grows exponentially in agents
Factored LP computes value function
Qo
Coordination graph computes argmaxa Q(x,a)
World
66
Strategic Generalization
Relational MDP model
2 Peasants, 2 Footmen, Enemy, Gold, Wood,
Barracks 1 million state/action pairs
Factored LP computes class-level value function
instantiated Q-functions grow polynomially in
agents
wC
Coordination graph computes argmaxa Q?(x,a)
World
67
Demo Generalized 9x3
Guestrin, Koller, Gearhart Kanodia
68
Tactical Generalization
3 v. 3
4 v. 4
Generalize
  • Planned in 3 Footmen versus 3 Enemies
  • Generalized to 4 Footmen versus 4 Enemies

69
Demo Planned Tactical 3x3
Guestrin, Koller, Gearhart Kanodia
70
Demo Generalized Tactical 4x4
Guestrin, Koller, Gearhart Kanodia
Guestrin, K., Gearhart Kanodia 03
71
Summary
Effective planning under uncertainty
Distributed coordinated action selection
Generalization to new problems
Structured Multi-Agent MDPs
72
Important Questions
Continuous spaces
Partial observability
Complex actions
Learning to act
How far can we go??
73
Thank You!
http//robotics.stanford.edu/koller
Carlos Guestrin Ronald Parr
Chris Gearhart Neal Kanodia Shobha Venkataraman
Curt Bererton Geoff Gordon Sebastian Thrun
Jelle Kok Matthijs Spaan Nikos Vlassis
Write a Comment
User Comments (0)
About PowerShow.com