A Framework for Sequential Planning in Multiagent Settings presentation

About This Presentation

Title:

A Framework for Sequential Planning in Multiagent Settings

Description:

1. Consider other agents by including agent models as part of the state space ... Pr(TR,b_j) b_j. L. OR. L. L. L. L. L. L. L. OR. L. OR. L. L. L. OR. GL,S. GL, ... –

Number of Views:75

Avg rating:3.0/5.0

Slides: 18

Provided by: daliA6

Category:

more less

Transcript and Presenter's Notes

Title: A Framework for Sequential Planning in Multiagent Settings

1
A Framework for Sequential Planning in
Multi-agent Settings
Workshop on Game Theory and Decision
Theory AAMAS, July 2004

AUTHORS
Piotr Gmytrasiewicz and Prashant Doshi
Dept. of Computer Science
Univ. of Illinois at Chicago

PRESENTER Prashant Doshi
2
Main Contribution
General Problem Setting
3
Main Contribution

Interactive POMDPs
A framework for sequential optimal planning in
partially observable multi-agent settings (POSG)
generalizes POMDPs to multi-agent settings
Main Ideas
1. Consider other agents by including agent
models as part of the state space
2. Models of agents include their capabilities,
preferences, and their beliefs
3. Agent maintains beliefs over models of other
agents ? infinite nesting of beliefs
4. Compute best response to beliefs (subjective
rationality)

4
Background Single-agent POMDPs

Partially Observable Markov Decision Processes
Standard Optimal Sequential Planning Framework
Realistic

POMDP Parameters

S, physical state space of environment
A, action space of the agent
?, observation space of the agent
, transition function
, observation function
, preference function

5
Background Single-agent POMDPs
Belief Update ? Agent maintains belief over
physical state and updates it
Solution of POMDP
Policy Comp. ? Agent computes optimal action for
each belief
6
I-POMDPs

Interactive Partially Observable Markov Decision
Processes
Generalization of POMDPs to multi-agent settings
Main Ideas 1. Consider other agents as part of
the environment
2. Agent maintains and updates an infinite
nesting of beliefs
Borrows concepts from several fields
Bayesian games
Interactive epistemology / recursive modeling
Decision-theoretic planning
Decision-theoretic approach to game theory

7
I-POMDPs

I-POMDPi Parameters
where general model (computable)
Model Non-manipulability Assumption (MNM)
Actions dont directly manipulate others models
(instead, actions ? observations ? belief update)
Model Non-observability Assumption (MNO)
Models of other agents cannot be directly
observed (instead, beliefs ? actions ?
observations)
Preferences are generally over physical states
and actions

e.g.
intentional model or type
8
I-POMDPs

Beliefs
Single-agent POMDP
I-POMDPi

uncountably infinite
Similar belief systems explored in game theory by
MertensZamir85, BrandenburgerDekel93, and
Aumann99
9
I-POMDPs

Finitely nested I-POMDP I-POMDPi,l
Computable approximations of I-POMDPs

bottom up

0th level type is a POMDP

10
Multi-agent Tiger game

Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger
Each agent hears growls as well as creaks
Each agent may open doors or listen
Each agent is unable to perceive others action
or observation

2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
11
Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
12
Multi-agent Tiger game
Agent is belief update process
L
GL,S
13
Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
14
Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
15
I-POMDPs

Theoretical Results
Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient statistic
for the past history of is observations
Proposition 2 (Belief Update) Under the MNM and
MNO assumptions, the belief update function for
I-POMDPi is
Theorem 1 (Convergence) For any finitely nested
I-POMDP, the Value Iteration algorithm starting
from an arbitrary value function converges to a
unique fixed point
Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex

16
Research Contributions

I-POMDPs provide a principled method for
Modeling other agents
Updating beliefs over models using sensory
information
Computing best response to beliefs
Limitations of Nash equilibrium as a general
multi-agent control paradigm in AI
Incomplete Does not say what to do
off-equilibria
Non-unique Multiple solutions, no way to choose
Our approach complements Nash equilibrium
adopts optimality and best response to
anticipated actions, rather than stability
Formalizes greater autonomy amongst agents
actions and
observations of other agents are not known, BNO,
BNM
Applicable to games of cooperation and competition

17
Thank You
Questions

Write a Comment

User Comments (0)

About PowerShow.com