Graphical Models for Online Solutions to Interactive POMDPs

1 / 26
About This Presentation
Title:

Graphical Models for Online Solutions to Interactive POMDPs

Description:

Policy link: dashed line. Distribution over the other agent's actions given its models ... the contributing agents punish free riders P but incur a small cost ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 27
Provided by: zengy

less

Transcript and Presenter's Notes

Title: Graphical Models for Online Solutions to Interactive POMDPs


1
Graphical Models for Online Solutions to
Interactive POMDPs
International Conference on Autonomous Agents and
Multiagent Systems (AAMAS 2007)
  • Prashant Doshi Yifeng Zeng Qiongyu
    Chen
  • University of Georgia Aalborg University
    National Univ.
  • USA Denmark of Singapore

2
Decision-Making in Multiagent Settings
State (S)
Actions (Aj)
Belief over state and model of i
Actions (Ai)
Belief over state and model of j
Observations (Oi)
Observations (Oj)
Agent j
Agent i
Act to optimize preferences given beliefs
3
Finitely Nested I-POMDP (GmytrasiewiczDoshi, 05)
  • A finitely nested I-POMDP of agent i with a
    strategy level l
  • Interactive states
  • Beliefs about physical environments
  • Beliefs about other agents in terms of their
    preferences, capabilities, and beliefs
  • Type
  • A Joint actions
  • Possible observations
  • Ti Transition function SAS? 0,1
  • Oi Observation function SA ?0,1
  • Ri Reward function SA?

4
Belief Update
5
Forget It!
  • Different approach
  • Use the language of Influence Diagrams (IDs) to
    represent the problem more transparently
  • Belief update
  • Use standard ID algorithms to solve it
  • Solution

6
Challenges
  • Representation of nested models for other agents
  • Influence diagram is a single agent oriented
    language
  • Update beliefs on models of other agents
  • New models of other agents
  • Over time agents revise beliefs over the models
    of others as they receive observations

7
Related Work
  • Multiagent Influence Diagrams (MAIDs)
    (KollerMilch,2001)
  • Uses IDs to represent incomplete information
    games
  • Compute Nash equilibrium solutions efficiently by
    exploiting
  • conditional independence
  • Network of Influence Diagrams (NIDs)
    (GalPfeffer,2003)
  • Allows uncertainty over the game
  • Allows multiple models of an individual agent
  • Solution involves collapsing models into a MAID
    or ID
  • Both model static single play games
  • Do not consider agent interactions over time
    (sequential decision-making)

8
Introduce Model Node and Policy Link
Ri
  • A generic level l Interactive-ID (I-ID) for agent
    i situated with one other agent j
  • Model Node Mj,l-1
  • Models of agent j at level l-1
  • Policy link dashed line
  • Distribution over the other agents actions given
    its models
  • Beliefs on Mj,l-1
  • P(Mj,l-1s)
  • Update?

Ai
S
Oi
9
Details of the Model Node
  • Members of the model node
  • Different chance nodes are solutions of models
    mj,l-1
  • ModMj represents the different models of agent
    j
  • CPT of the chance node Aj is a multiplexer
  • Assumes the distribution of each of the action
    nodes (Aj1, Aj2) depending on the value of ModMj

Mj,l-1
Aj
S
ModMj
mj,l-11
Aj1
mj,l-11, mj,l-12 could be I-IDs or IDs
mj,l-12
Aj2
10
Whole I-ID
Ri
Ai
S
Aj
Oi
ModMj
mj,l-11, mj,l-12 could be I-IDs or IDs
Aj1
Aj2
mj,l-11
mj,l-12
11
Interactive Dynamic Influence Diagrams (I-DIDs)
Ri
Ait
Ajt
St
Oit
Mj,l-1t
Model Update Link
12
Semantics of Model Update Link
Ajt1
Mj,l-1t1
Ajt
st1
Mj,l-1t
ModMjt1
st
mj,l-1t1,1
Aj1
ModMjt
mj,l-1t1,2
Oj
Aj2
mj,l-1t1,3
mj,l-1t,1
Aj3
Aj1
Oj1
mj,l-1t1,4
mj,l-1t,2
Aj4
Aj2
Oj2
These models differ in their initial beliefs,
each of which is the result of j updating its
beliefs due to its actions and possible
observations
13
Notes
  • Updated set of models at time step (t1) will
    have at most models
  • number of models at time step t
  • largest space of actions
  • largest space of observations
  • New distribution over the updated models uses
  • original distribution over the models
  • probability of the other agent performing the
    action, and
  • receiving the observation that led to the updated
    model

14
mj,l-1t1,1
Aj1
mj,l-1t1,2
Aj2
mj,l-1t1,3
mj,l-1t,1
Aj3
mj,l-1t1,4
mj,l-1t,2
Aj4
15
Example Applications Emergence of Social
Behaviors
  • Followership and Leadership in the persistent
    multiagent tiger problem
  • Altruism and Reciprocity in the public good
    problem with punishment
  • Strategies in a simple version of two-player
    Poker

16
Followership and Leadership in Multiagent
Persistent Tiger
  • Experimental Setup
  • Agent j has a better hearing capability (95
    accurate) compared to is (65 accuracy)
  • Agent i does not have initial information about
    the tigers location
  • Agent i considers two models of agent j which
    differ in js level 0 initial beliefs
  • Agent j likely thinks that the tiger is behind
    the left door
  • Agent j likely thinks that the tiger is behind
    the right door
  • Solve the corresponding level 1 I-DID expanded
    over three time steps and get the normative
    behavioral policy of agent i

17
Level 1 I-ID in the Tiger Problem
Expand over three time steps
Mapping decision nodes to chance nodes
18
Policy Tree 1 Agent i has hearing accuracy of
65
L
GL,
GR,
L
L
GL,
GR,CL
GL,CR
GR,
GL,S/CL
GR,S/CR
L
OL
OR
L
L
L
Conditional Followership
19
Policy Tree 2 Agent i loses hearing ability
(accuracy is 0.5)
L
,
L
,CR
,S
,CL
OR
OL
L
Unconditional (Blind) Followership
20
Example 2 Altruism and Reciprocity in the Public
Good Problem
  • Public Good Game
  • Two agents initially endowed with XT amount of
    resources
  • Each agent may choose
  • contribute (C) a fixed amount of the resources to
    a public pot
  • not contribute ie. defect (D)
  • Agents actions and pot are not observable, but
    agents receive an observation symbolizing the
    state of the public pot
  • plenty (PY)
  • meager (MR)
  • Value of resources in the public pot is
    discounted by ci (
    is the marginal private return
  • In order to encourage contributions, the
    contributing agents punish free riders P but
    incur a small cost cp for administering the
    punishment

21
Agent Types
  • Altruistic and Non-altruistic types
  • Altruistic agent has a high marginal private
    return (ci is close to 1) and does not punish
    others who defect
  • Optimal Behavior
  • One action remaining both types of agents choose
    to contribute to avoid being punished
  • Two actions to go altruistic type chooses to
    contribute, while the other defects
  • Why?
  • Three steps to go the altruistic agent
    contributes to avoid punishment and the
    non-altruistic type defects
  • Greater than three steps altruistic agent
    continues to contribute to the public pot
    depending on how close its marginal return is to
    1, the non-altruistic type prescribes defection

22
Level 1 I-ID in the Public Good Game
Expand over three time steps
23
Policy Tree 1 Altruism in PG
C
  • If agent i (altruistic type) believes with a
    probability 1 that j is altruistic, i chooses to
    contribute for each of the three steps.
  • This behavior persists when i is unaware of
    whether j is altruistic, and when i assigns a
    high probability to j being the non-altruistic
    type


C

C
24
Policy Tree 2 Reciprocal Agents
  • Reciprocal Type
  • The reciprocal types marginal private return is
    less and obtains a greater payoff when its action
    is similar to that of the other
  • Experimental Setup
  • Consider the case when the reciprocal agent i is
    unsure of whether j is altruistic and believes
    that the public pot is likely to be half full
  • Optimal Behavior
  • From this prior belief, i chooses to defect
  • On receiving an observation of plenty, i decides
    to contribute, while an observation of meager
    makes it defect
  • With one action to go, i believes that j
    contributes, will choose to contribute too to
    avoid punishment regardless of its observations

D
PY
MR
D
C


C
C
25
Conclusion and Future Work
  • I-DIDs A general ID-based formalism for
    sequential decision-making in multiagent settings
  • Online counterparts of I-POMDPs
  • Solving I-DIDs approximately for computational
    efficiency (see AAAI 07 paper on model
    clustering)
  • Apply I-DIDs to other application domains
  • Visit our poster on I-DIDs today for more
    information

26
Thank You!
Write a Comment
User Comments (0)