A Framework for Sequential Planning in Multiagent Settings

1 / 17
About This Presentation
Title:

A Framework for Sequential Planning in Multiagent Settings

Description:

1. Consider other agents by including agent models as part of the state space ... Pr(TR,b_j) b_j. L. OR. L. L. L. L. L. L. L. OR. L. OR. L. L. L. OR. GL,S. GL, ... –

Number of Views:75
Avg rating:3.0/5.0
Slides: 18
Provided by: daliA6
Category:

less

Transcript and Presenter's Notes

Title: A Framework for Sequential Planning in Multiagent Settings


1
A Framework for Sequential Planning in
Multi-agent Settings
Workshop on Game Theory and Decision
Theory AAMAS, July 2004
  • AUTHORS
  • Piotr Gmytrasiewicz and Prashant Doshi
  • Dept. of Computer Science
  • Univ. of Illinois at Chicago

PRESENTER Prashant Doshi
2
Main Contribution
General Problem Setting
3
Main Contribution
  • Interactive POMDPs
  • A framework for sequential optimal planning in
    partially observable multi-agent settings (POSG)
  • generalizes POMDPs to multi-agent settings
  • Main Ideas
  • 1. Consider other agents by including agent
    models as part of the state space
  • 2. Models of agents include their capabilities,
    preferences, and their beliefs
  • 3. Agent maintains beliefs over models of other
    agents ? infinite nesting of beliefs
  • 4. Compute best response to beliefs (subjective
    rationality)

4
Background Single-agent POMDPs
  • Partially Observable Markov Decision Processes
  • Standard Optimal Sequential Planning Framework
  • Realistic

POMDP Parameters
  • S, physical state space of environment
  • A, action space of the agent
  • ?, observation space of the agent
  • , transition function
  • , observation function
  • , preference function

5
Background Single-agent POMDPs
Belief Update ? Agent maintains belief over
physical state and updates it
Solution of POMDP
Policy Comp. ? Agent computes optimal action for
each belief
6
I-POMDPs
  • Interactive Partially Observable Markov Decision
    Processes
  • Generalization of POMDPs to multi-agent settings
  • Main Ideas 1. Consider other agents as part of
    the environment
  • 2. Agent maintains and updates an infinite
    nesting of beliefs
  • Borrows concepts from several fields
  • Bayesian games
  • Interactive epistemology / recursive modeling
  • Decision-theoretic planning
  • Decision-theoretic approach to game theory

7
I-POMDPs
  • I-POMDPi Parameters
  • where general model (computable)
  • Model Non-manipulability Assumption (MNM)
    Actions dont directly manipulate others models
    (instead, actions ? observations ? belief update)
  • Model Non-observability Assumption (MNO)
    Models of other agents cannot be directly
    observed (instead, beliefs ? actions ?
    observations)
  • Preferences are generally over physical states
    and actions

e.g.
intentional model or type
8
I-POMDPs
  • Beliefs
  • Single-agent POMDP
  • I-POMDPi

uncountably infinite
Similar belief systems explored in game theory by
MertensZamir85, BrandenburgerDekel93, and
Aumann99
9
I-POMDPs
  • Finitely nested I-POMDP I-POMDPi,l
  • Computable approximations of I-POMDPs

bottom up
  • 0th level type is a POMDP

10
Multi-agent Tiger game
  • Task Maximize collection of gold over a finite
    or infinite number of steps while avoiding tiger
  • Each agent hears growls as well as creaks
  • Each agent may open doors or listen
  • Each agent is unable to perceive others action
    or observation

2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
11
Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
12
Multi-agent Tiger game
Agent is belief update process
L
GL,S
13
Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
14
Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
15
I-POMDPs
  • Theoretical Results
  • Proposition 1 (Sufficiency) In an I-POMDP,
    belief over is a sufficient statistic
    for the past history of is observations
  • Proposition 2 (Belief Update) Under the MNM and
    MNO assumptions, the belief update function for
    I-POMDPi is
  • Theorem 1 (Convergence) For any finitely nested
    I-POMDP, the Value Iteration algorithm starting
    from an arbitrary value function converges to a
    unique fixed point
  • Theorem 2 (PWLC) For any finitely nested
    I-POMDP, the value function is piecewise linear
    and convex

16
Research Contributions
  • I-POMDPs provide a principled method for
  • Modeling other agents
  • Updating beliefs over models using sensory
    information
  • Computing best response to beliefs
  • Limitations of Nash equilibrium as a general
    multi-agent control paradigm in AI
  • Incomplete Does not say what to do
    off-equilibria
  • Non-unique Multiple solutions, no way to choose
  • Our approach complements Nash equilibrium
    adopts optimality and best response to
    anticipated actions, rather than stability
  • Formalizes greater autonomy amongst agents
    actions and
  • observations of other agents are not known, BNO,
    BNM
  • Applicable to games of cooperation and competition

17
Thank You
Questions
Write a Comment
User Comments (0)
About PowerShow.com