Title: A Framework for Sequential Planning in Multiagent Settings
1A Framework for Sequential Planning in
Multi-agent Settings
Workshop on Game Theory and Decision
Theory AAMAS, July 2004
- AUTHORS
- Piotr Gmytrasiewicz and Prashant Doshi
- Dept. of Computer Science
- Univ. of Illinois at Chicago
PRESENTER Prashant Doshi
2Main Contribution
General Problem Setting
3Main Contribution
- Interactive POMDPs
- A framework for sequential optimal planning in
partially observable multi-agent settings (POSG) - generalizes POMDPs to multi-agent settings
- Main Ideas
- 1. Consider other agents by including agent
models as part of the state space - 2. Models of agents include their capabilities,
preferences, and their beliefs - 3. Agent maintains beliefs over models of other
agents ? infinite nesting of beliefs - 4. Compute best response to beliefs (subjective
rationality)
4Background Single-agent POMDPs
- Partially Observable Markov Decision Processes
- Standard Optimal Sequential Planning Framework
- Realistic
POMDP Parameters
- S, physical state space of environment
- A, action space of the agent
- ?, observation space of the agent
- , transition function
- , observation function
- , preference function
5Background Single-agent POMDPs
Belief Update ? Agent maintains belief over
physical state and updates it
Solution of POMDP
Policy Comp. ? Agent computes optimal action for
each belief
6I-POMDPs
- Interactive Partially Observable Markov Decision
Processes - Generalization of POMDPs to multi-agent settings
- Main Ideas 1. Consider other agents as part of
the environment - 2. Agent maintains and updates an infinite
nesting of beliefs - Borrows concepts from several fields
- Bayesian games
- Interactive epistemology / recursive modeling
- Decision-theoretic planning
- Decision-theoretic approach to game theory
7I-POMDPs
- I-POMDPi Parameters
- where general model (computable)
-
-
-
- Model Non-manipulability Assumption (MNM)
Actions dont directly manipulate others models
(instead, actions ? observations ? belief update) -
- Model Non-observability Assumption (MNO)
Models of other agents cannot be directly
observed (instead, beliefs ? actions ?
observations) -
- Preferences are generally over physical states
and actions
e.g.
intentional model or type
8I-POMDPs
- Beliefs
- Single-agent POMDP
- I-POMDPi
-
uncountably infinite
Similar belief systems explored in game theory by
MertensZamir85, BrandenburgerDekel93, and
Aumann99
9I-POMDPs
- Finitely nested I-POMDP I-POMDPi,l
- Computable approximations of I-POMDPs
bottom up
- 0th level type is a POMDP
10Multi-agent Tiger game
- Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger - Each agent hears growls as well as creaks
- Each agent may open doors or listen
- Each agent is unable to perceive others action
or observation
2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
11Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
12Multi-agent Tiger game
Agent is belief update process
L
GL,S
13Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
14Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
15I-POMDPs
- Theoretical Results
- Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient statistic
for the past history of is observations - Proposition 2 (Belief Update) Under the MNM and
MNO assumptions, the belief update function for
I-POMDPi is - Theorem 1 (Convergence) For any finitely nested
I-POMDP, the Value Iteration algorithm starting
from an arbitrary value function converges to a
unique fixed point - Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex
16Research Contributions
- I-POMDPs provide a principled method for
- Modeling other agents
- Updating beliefs over models using sensory
information - Computing best response to beliefs
- Limitations of Nash equilibrium as a general
multi-agent control paradigm in AI - Incomplete Does not say what to do
off-equilibria - Non-unique Multiple solutions, no way to choose
- Our approach complements Nash equilibrium
adopts optimality and best response to
anticipated actions, rather than stability - Formalizes greater autonomy amongst agents
actions and - observations of other agents are not known, BNO,
BNM - Applicable to games of cooperation and competition
17Thank You
Questions