Graphical Models for Online Solutions to Interactive POMDPs

About This Presentation

Title:

Graphical Models for Online Solutions to Interactive POMDPs

Description:

Policy link: dashed line. Distribution over the other agent's actions given its models ... the contributing agents punish free riders P but incur a small cost ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 27

Provided by: zengy

more less

Transcript and Presenter's Notes

Title: Graphical Models for Online Solutions to Interactive POMDPs

1
Graphical Models for Online Solutions to
Interactive POMDPs
International Conference on Autonomous Agents and
Multiagent Systems (AAMAS 2007)

Prashant Doshi Yifeng Zeng Qiongyu
Chen
University of Georgia Aalborg University
National Univ.
USA Denmark of Singapore

2
Decision-Making in Multiagent Settings
State (S)
Actions (Aj)
Belief over state and model of i
Actions (Ai)
Belief over state and model of j
Observations (Oi)
Observations (Oj)
Agent j
Agent i
Act to optimize preferences given beliefs
3
Finitely Nested I-POMDP (GmytrasiewiczDoshi, 05)

A finitely nested I-POMDP of agent i with a
strategy level l
Interactive states
Beliefs about physical environments
Beliefs about other agents in terms of their
preferences, capabilities, and beliefs
Type
A Joint actions
Possible observations
Ti Transition function SAS? 0,1
Oi Observation function SA ?0,1
Ri Reward function SA?

4
Belief Update
5
Forget It!

Different approach
Use the language of Influence Diagrams (IDs) to
represent the problem more transparently
Belief update
Use standard ID algorithms to solve it
Solution

6
Challenges

Representation of nested models for other agents
Influence diagram is a single agent oriented
language
Update beliefs on models of other agents
New models of other agents
Over time agents revise beliefs over the models
of others as they receive observations

7
Related Work

Multiagent Influence Diagrams (MAIDs)
(KollerMilch,2001)
Uses IDs to represent incomplete information
games
Compute Nash equilibrium solutions efficiently by
exploiting
conditional independence
Network of Influence Diagrams (NIDs)
(GalPfeffer,2003)
Allows uncertainty over the game
Allows multiple models of an individual agent
Solution involves collapsing models into a MAID
or ID
Both model static single play games
Do not consider agent interactions over time
(sequential decision-making)

8
Introduce Model Node and Policy Link
Ri

A generic level l Interactive-ID (I-ID) for agent
i situated with one other agent j
Model Node Mj,l-1
Models of agent j at level l-1
Policy link dashed line
Distribution over the other agents actions given
its models
Beliefs on Mj,l-1
P(Mj,l-1s)
Update?

Ai
S
Oi
9
Details of the Model Node

Members of the model node
Different chance nodes are solutions of models
mj,l-1
ModMj represents the different models of agent
j
CPT of the chance node Aj is a multiplexer
Assumes the distribution of each of the action
nodes (Aj1, Aj2) depending on the value of ModMj

Mj,l-1
Aj
S
ModMj
mj,l-11
Aj1
mj,l-11, mj,l-12 could be I-IDs or IDs
mj,l-12
Aj2
10
Whole I-ID
Ri
Ai
S
Aj
Oi
ModMj
mj,l-11, mj,l-12 could be I-IDs or IDs
Aj1
Aj2
mj,l-11
mj,l-12
11
Interactive Dynamic Influence Diagrams (I-DIDs)
Ri
Ait
Ajt
St
Oit
Mj,l-1t
Model Update Link
12
Semantics of Model Update Link
Ajt1
Mj,l-1t1
Ajt
st1
Mj,l-1t
ModMjt1
st
mj,l-1t1,1
Aj1
ModMjt
mj,l-1t1,2
Oj
Aj2
mj,l-1t1,3
mj,l-1t,1
Aj3
Aj1
Oj1
mj,l-1t1,4
mj,l-1t,2
Aj4
Aj2
Oj2
These models differ in their initial beliefs,
each of which is the result of j updating its
beliefs due to its actions and possible
observations
13
Notes

Updated set of models at time step (t1) will
have at most models
number of models at time step t
largest space of actions
largest space of observations
New distribution over the updated models uses
original distribution over the models
probability of the other agent performing the
action, and
receiving the observation that led to the updated
model

14
mj,l-1t1,1
Aj1
mj,l-1t1,2
Aj2
mj,l-1t1,3
mj,l-1t,1
Aj3
mj,l-1t1,4
mj,l-1t,2
Aj4
15
Example Applications Emergence of Social
Behaviors

Followership and Leadership in the persistent
multiagent tiger problem
Altruism and Reciprocity in the public good
problem with punishment
Strategies in a simple version of two-player
Poker

16
Followership and Leadership in Multiagent
Persistent Tiger

Experimental Setup
Agent j has a better hearing capability (95
accurate) compared to is (65 accuracy)
Agent i does not have initial information about
the tigers location
Agent i considers two models of agent j which
differ in js level 0 initial beliefs
Agent j likely thinks that the tiger is behind
the left door
Agent j likely thinks that the tiger is behind
the right door
Solve the corresponding level 1 I-DID expanded
over three time steps and get the normative
behavioral policy of agent i

17
Level 1 I-ID in the Tiger Problem
Expand over three time steps
Mapping decision nodes to chance nodes
18
Policy Tree 1 Agent i has hearing accuracy of
65
L
GL,
GR,
L
L
GL,
GR,CL
GL,CR
GR,
GL,S/CL
GR,S/CR
L
OL
OR
L
L
L
Conditional Followership
19
Policy Tree 2 Agent i loses hearing ability
(accuracy is 0.5)
L
,
L
,CR
,S
,CL
OR
OL
L
Unconditional (Blind) Followership
20
Example 2 Altruism and Reciprocity in the Public
Good Problem

Public Good Game
Two agents initially endowed with XT amount of
resources
Each agent may choose
contribute (C) a fixed amount of the resources to
a public pot
not contribute ie. defect (D)
Agents actions and pot are not observable, but
agents receive an observation symbolizing the
state of the public pot
plenty (PY)
meager (MR)
Value of resources in the public pot is
discounted by ci (
is the marginal private return
In order to encourage contributions, the
contributing agents punish free riders P but
incur a small cost cp for administering the
punishment

21
Agent Types

Altruistic and Non-altruistic types
Altruistic agent has a high marginal private
return (ci is close to 1) and does not punish
others who defect
Optimal Behavior
One action remaining both types of agents choose
to contribute to avoid being punished
Two actions to go altruistic type chooses to
contribute, while the other defects
Why?
Three steps to go the altruistic agent
contributes to avoid punishment and the
non-altruistic type defects
Greater than three steps altruistic agent
continues to contribute to the public pot
depending on how close its marginal return is to
1, the non-altruistic type prescribes defection

22
Level 1 I-ID in the Public Good Game
Expand over three time steps
23
Policy Tree 1 Altruism in PG
C

If agent i (altruistic type) believes with a
probability 1 that j is altruistic, i chooses to
contribute for each of the three steps.
This behavior persists when i is unaware of
whether j is altruistic, and when i assigns a
high probability to j being the non-altruistic
type

C

C
24
Policy Tree 2 Reciprocal Agents

Reciprocal Type
The reciprocal types marginal private return is
less and obtains a greater payoff when its action
is similar to that of the other
Experimental Setup
Consider the case when the reciprocal agent i is
unsure of whether j is altruistic and believes
that the public pot is likely to be half full
Optimal Behavior
From this prior belief, i chooses to defect
On receiving an observation of plenty, i decides
to contribute, while an observation of meager
makes it defect
With one action to go, i believes that j
contributes, will choose to contribute too to
avoid punishment regardless of its observations

D
PY
MR
D
C

C
C
25
Conclusion and Future Work

I-DIDs A general ID-based formalism for
sequential decision-making in multiagent settings
Online counterparts of I-POMDPs
Solving I-DIDs approximately for computational
efficiency (see AAAI 07 paper on model
clustering)
Apply I-DIDs to other application domains
Visit our poster on I-DIDs today for more
information

26
Thank You!

Write a Comment

User Comments (0)