MultiAgent Planning in Complex Uncertain Environments - PowerPoint PPT Presentation

About This Presentation

Title:

MultiAgent Planning in Complex Uncertain Environments

Description:

Ronald Parr (Duke) 2004 Carlos Guestrin, Daphne Koller. Collaborative Multiagent Planning ... [Guestrin, K., Parr '01] Q sub-functions assigned to relevant agents ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 74

Provided by: btas4

Learn more at: http://robotics.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: MultiAgent Planning in Complex Uncertain Environments

1
Multi-Agent Planning in Complex Uncertain
Environments

Daphne Koller
Stanford University

Joint work with Carlos Guestrin (CMU) Ronald
Parr (Duke)
2
Collaborative Multiagent Planning
Collaborative Multiagent Planning
Long-term goals
Multiple agents
Coordinated decisions

Search and rescue, firefighting
Factory management
Multi-robot tasks (Robosoccer)
Network routing
Air traffic control
Computer game playing

3
Joint Planning Space

Joint action space
Each agent i takes action ai at each step
Joint action a a1,, an for all agents
Joint state space
Assignment x1,,xn to some set of variables
X1,,Xn
Joint state x x1,, xn of entire system
Joint system Payoffs and state dynamics depend
on joint state and joint action
Cooperative agents Want to maximize total payoff

4
Exploiting Structure

Real-world problems have
Hundreds of objects
Googles of states
Real-world problems have structure!

Approach Exploit structured representation to
obtain efficient approximate solution
5
Outline

Action Coordination
Factored Value Functions
Coordination Graphs
Context-Specific Coordination
Joint Planning
Multi-Agent Markov Decision Processes
Efficient Linear Programming Solution
Decentralized Market-Based Solution
Generalizing to New Environments
Relational MDPs
Generalizing Value Functions

6
One-Shot Optimization Task

Q-function Q(x,a) encodes agents payoff for
joint action a in joint state x
Agents task To compute
actions is exponential ?
Complete state observability ?
Full agent communication ?

7
Factored Payoff Function

Approximate Q function as sum of Q sub-functions
Each sub-function depends on local part of system
Two interacting agents
Agent and important resource
Two inter-dependent pieces of machinery

Q(A1,,A4, X1,,X4)
¼
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)

Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)

K. Parr 99,00 Guestrin, K., Parr 01
8
Distributed Q Function

Q sub-functions assigned to relevant agents

Q(A1,,A4, X1,,X4)
¼
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)

Q4(A3, A4, X3,X4)
Q3(A2, A3, X2,X3)

Guestrin, K., Parr 01
9
Multiagent Action Selection
Instantiate current state x
Maximal action argmaxa
Distributed Q function
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)
Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)
10
Instantiating State x
Limited observability ? agent i only observes
variables in Qi
Q2(A1, A2, X1,X2)
Q1(A1, A4, X1,X4)
Q3(A2, A3, X2,X3)
Q4(A3, A4, X3,X4)
11
Choosing Action at State x
Instantiate current state x
Q2(A1, A2, X1,X2)
Q2(A1, A2)
Q1(A1, A4, X1,X4)
Q1(A1, A4)
Q3(A2, A3, X2,X3)
Q3(A2, A3)
Q4(A3, A4, X3,X4)
Q4(A3, A4)
12
Variable Elimination

maxa

Use variable elimination for maximization

Q2(A1, A2)
Q1(A1, A4)
Q3(A2, A3)

Limited communication ? for optimal action choice
Comm. bandwidth tree-width of coord. graph

Q4(A3, A4)
13
Choosing Action at State x
14
Choosing Action at State x
Q2(A1, A2)
max
g1(A2, A4)
Q1(A1, A4)
A3
Q3(A2, A3)

Q4(A3, A4)
15
Coordination Graphs

Communication follows triangulated graph
Computation grows exponentially in tree width
Graph-theoretic measure of connectedness
Arises in BNs, CSPs,
Cost exponential in worst case,
fairly low for many real graphs

A11
A10
A5
A1
A2
16
Context-Specific Interactions

Payoff structure can vary by context
Agents A1, A2 both trying to pass through same
narrow corridor
Can use context-specific value rules
ltAt(X,A1), At(X,A2),
A1 fwd ? A2 fwd -100gt
Hope Context-specific payoffs will induce
context-specific coordination

17
Context-Specific Coordination
A6
A5
A1
A2
A4
A3
Instantiate current state x true
18
Context-Specific Coordination
A6
A5
Coordination structure varies based on context
A1
A2
A4
A3
19
Context-Specific Coordination
A6
A5
Coordination structure varies based on
communication
A1
A2
A4
A3
Maximizing out A1
Rule-based variable elimination Zhang Poole
99
20
Context-Specific Coordination
A6
A5
Coordination structure varies based on agent
decisions
A1
A2
A4
A3
Eliminate A1 from the graph
Rule-based variable elimination Zhang Poole
99
21
Robot Soccer
Kok, Vlassis Groen University of Amsterdam

UvA Trilearn 2002 won German Open 2002, but
placed fourth in Robocup-2002.
the improvements introduced in UvA Trilearn
2003 include an extension of the intercept
skill, improved passing behavior and especially
the usage of coordination graphs to specify the
coordination requirements between the different
agents.

22
RoboSoccer Value Rules

Coordination graph rules include conditions on
player role and aspects of global system state
Example rules for player i, in role of passer

Depends on distance of j to goal after move
23
UvA Trilearn 2003 Results

UvA Trilearn won
German Open 2003
US Open 2003
RoboCup 2003
German Open 2004

24
Outline

Action Coordination
Factored Value Functions
Coordination Graphs
Context-Specific Coordination
Joint Planning
Multi-Agent Markov Decision Processes
Efficient Linear Programming Solution
Decentralized Market-Based Solution
Generalizing to New Environments
Relational MDPs
Generalizing Value Functions

25
peasant
footman

Real-time Strategy Game
Peasants collect resources and build
Footmen attack enemies
Buildings train peasants and footmen

building
26
Planning Over Time
Markov Decision Process (MDP) representation

Action space Joint agent actions a a1,, an
State space Joint state descriptions x x1,,
xn
Momentary reward function R(x,a)
Probabilistic system dynamics P(xx,a)

27
Policy
At state x, action a for all agents
Policy ?(x) a
28
Value of Policy
Expected long-term reward starting from x
Value V?(x)
?(x0)
29
Optimal Long-term Plan
Optimal Q-function Q(x,a)
Optimal policy ?(x)
Bellman Equations
30
Solving an MDP
Solve Bellman equation
Optimal value V(x)
Optimal policy ?(x)
Many algorithms solve the Bellman equations

Policy iteration Howard 60, Bellman 57
Value iteration Bellman 57
Linear programming Manne 60

31
LP Solution to MDP

One variable V (x) for each state
One constraint for each state x and action a
Polynomial time solution

32
Are We Done?

Planning is polynomial in states and actions
states exponential in number of variables
actions exponential in number of agents

Efficient approximation by exploiting structure!
33
Structured Representation
Factored MDP
Boutilier et al. 95
P(FF,G,AB,AF)

State
Dynamics
Decisions
Rewards

Complexity of representation Exponential in
parents (worst case)
34
Structured Value function ?
Factored MDP ? Structure in V
Factored MDP Structure in V
Factored V often provides good approximate value
function
35
Structured Value Functions
Bellman et al. 63, Tsitsiklis Van Roy
96 K. Parr 99,00

Approximate V as a factored value function
In rule-based case
hi is a rule concerning small part of the system
wi is the value associated with the rule
Goal find w giving good approximation V to V

Factored value function V ? wi hi
Factored Q function Q ? Qi
Can use coordination graph
36
Approximate LP Solution
³

One variable wi for each basis function ?
Polynomial number of LP variables
One constraint for every state and action ?
Exponentially many LP constraints

37
So What Now?
Guestrin, K., Parr 01
Exponentially many linear one nonlinear
constraint
38
Variable Elimination Revisited
Guestrin, K., Parr 01

Use Variable Elimination to represent constraints

Exponentially fewer constraints
Polynomial LP for finding good factored
approximation to V
39
Network Management Problem

Computer runs processes
Computer status good, dead, faulty
Dead neighbors increase dying probability
Reward for successful processes
Each SysAdmin takes local action reboot, not
reboot

Ring
Ring of Rings
Star
k-grid
40
Scaling of Factored LP
41
Multiagent Running Time
Ring of rings
Star pair basis
Star single basis
42
Strategic 2x2
Factored MDP model 2 Peasants, 2 Footmen, Enemy,
Gold, Wood, Barracks 1 million state/action pairs
Factored LP computes value function
Q
Coordination graph computes argmaxa Q(x,a)
World
43
Demo Strategic 2x2
Guestrin, Koller, Gearhart Kanodia
44
Limited Interaction MDPs
Guestrin Gordon, 02

Some MDPs have additional structure
Agents are largely autonomous
Interact in limited ways
e.g., competing for resources
Can decompose MDP as set of agent-based MDPs,
with limited interface

45
Limited Interaction MDPs
Guestrin Gordon, 02

In such MDPs, our LP matrix is highly structured
Can use Dantzig-Wolfe LP decomposition to solve
LP optimally, in a decentralized way
Gives rise to a market-like algorithm with
multiple agents and a centralized auctioneer

46
Auction-style planning
Set pricing based on conflicts
Guestrin Gordon, 02

Each agent solves local (stand-alone) MDP
Agents send constraint messages to auctioneer
Must agree on policy for shared variables
Auctioneer sends pricing messages to agents
Pricing reflects penalties for constraint
violations
Influences agents rewards in their MDP

47
Fuel Allocation Problem
UAV start
Target

UAVs share a pot of fuel
Targets have varying priority
Ignore target interference

Bererton, Gordon, Thrun Khosla
48
Fuel Allocation Problem
Bererton, Gordon, Thrun, Khosla , 03
49
High-Speed Robot Paintball
Bererton, Gordon Thrun
50
High-Speed Robot Paintball
Game variant 2
Game variant 1
Coordination point Sensor Placement
x start location goal location
51
High-Speed Robot Paintball
Bererton, Gordon Thrun
52
Outline

Action Coordination
Factored Value Functions
Coordination Graphs
Context-Specific Coordination
Joint Planning
Multi-Agent Markov Decision Processes
Efficient Linear Programming Solution
Decentralized Market-Based Solution
Generalizing to New Environments
Relational MDPs
Generalizing Value Functions

53
Generalizing to New Problems
Many problems are similar
Good solution to Problem n1
Solve Problem 1
Solve Problem n
Solve Problem 2
MDPs are different! ? Different sets of states,
action, reward, transition,
54
Generalizing with Relational MDPs
Similar domains have similar types of
objects ?
Relational MDP
Exploit similarities by computing generalizable
value functions
Generalization
Avoid need to replan Tackle larger problems
55
Relational Models and MDPs
Guestrin, K., Gearhart Kanodia 03

Classes
Peasant, Footman, Gold, Barracks, Enemy
Relations
Collects, Builds, Trains, Attacks
Instances
Peasant1, Peasant2, Footman1, Enemy1
Builds on Probabilistic Relational Models K.
Pfeffer 98

56
Relational MDPs
Guestrin, K., Gearhart Kanodia 03
Enemy
Footman

Class-level transition probabilities depends on
Attributes Actions Attributes of related
objects
Class-level reward function

Very compact representation! Does not depend on
of objects
57
World is a Large Factored MDP
Links between objects
Relational MDP
Factored MDP
of objects

Instantiation (world)
instances of each class
Links between instances
Well-defined factored MDP

58
MDP with 2 Footmen and 2 Enemies
59
World is a Large Factored MDP
Links between objects
Relational MDP
Factored MDP
of objects

Instantiate world
Well-defined factored MDP
Use factored LP for planning
We have gained nothing! ?

60
Class-level Value Functions
VF1(F1.H, E1.H)
VE1(E1.H)
VF2(F2.H, E2.H)
VE2(E2.H)
VF
VF
VE
VE
V?(F1.H, E1.H, F2.H, E2.H)
Units are Interchangeable!

VF1 ? VF2 ? VF
VE1 ? VE2 ? VE
At state x, each footman has different
contribution to V
Given wC can instantiate value function for any
world ?
61
Factored LP-based Generalization
How many samples?
Generalize
Class- level factored LP
VF
VE
Sample Set I
62
Sampling Complexity
Exponentially many worlds
need exponentially many samples?
?
objects in a world is unbounded
must sample very large worlds?
?
NO!
63
Theorem
Sample m small worlds of up to O( ln 1/? )
objects
m
Value function within O(?) of class-level value
function optimized for all worlds, with prob. at
least 1-?
Rcmax is the maximum class reward
64
Strategic 2x2
Relational MDP model
2 Peasants, 2 Footmen, Enemy, Gold, Wood,
Barracks 1 million state/action pairs
Factored LP computes value function
Q
Coordination graph computes argmaxa Q(x,a)
World
65
Strategic 9x3
Relational MDP model
9 Peasants, 3 Footmen, Enemy, Gold, Wood,
Barracks
3 trillion state/action pairs
grows exponentially in agents
Factored LP computes value function
Qo
Coordination graph computes argmaxa Q(x,a)
World
66
Strategic Generalization
Relational MDP model
2 Peasants, 2 Footmen, Enemy, Gold, Wood,
Barracks 1 million state/action pairs
Factored LP computes class-level value function
instantiated Q-functions grow polynomially in
agents
wC
Coordination graph computes argmaxa Q?(x,a)
World
67
Demo Generalized 9x3
Guestrin, Koller, Gearhart Kanodia
68
Tactical Generalization
3 v. 3
4 v. 4
Generalize

Planned in 3 Footmen versus 3 Enemies
Generalized to 4 Footmen versus 4 Enemies

69
Demo Planned Tactical 3x3
Guestrin, Koller, Gearhart Kanodia
70
Demo Generalized Tactical 4x4
Guestrin, Koller, Gearhart Kanodia
Guestrin, K., Gearhart Kanodia 03
71
Summary
Effective planning under uncertainty
Distributed coordinated action selection
Generalization to new problems
Structured Multi-Agent MDPs
72
Important Questions
Continuous spaces
Partial observability
Complex actions
Learning to act
How far can we go??
73
Thank You!
http//robotics.stanford.edu/koller
Carlos Guestrin Ronald Parr
Chris Gearhart Neal Kanodia Shobha Venkataraman
Curt Bererton Geoff Gordon Sebastian Thrun
Jelle Kok Matthijs Spaan Nikos Vlassis

Write a Comment

User Comments (0)