Title: Strategic Negotiation and Cooperation Among Autonomous Agents
1Automated Negotiation
Sarit Kraus Bar-Ilan, Israel UMD,USA
2Plan of the course
- Introduction
- Rules of Encounters
- Strategic Negotiation
- Auctions
- protocols
- strategies
- Argumentation
3Machines Controlling and Sharing Resources
- Electrical grids (load balancing)
- Telecommunications networks (routing)
- PDAs (schedulers)
- Shared databases (intelligent access)
- Traffic control (coordination)
4Broad Working Assumption
- Designers (from different companies, countries,
etc.) come together to agree on standards for how
their automated agents will interact (in a given
domain) - Discuss various possibilities and their
tradeoffs, and agree on protocols, strategies,
and social laws to be implemented in their
machines
5Attributes of Standards
- Efficient Pareto Optimal
- Stable No incentive to deviate
- Simple Low computational and communication
cost - Distributed No central decision-maker
- Symmetric Agents play equivalent roles
Designing protocols for specific classes of
domains that satisfy some or all of these
attributes
6 Distributed Artificial Intelligence (DAI)
- Distributed Problem Solving (DPS) Centrally
designed systems, built-in cooperation, have
global problem to solve - Multi-Agent Systems (MAS) Group of
utility-maximizing heterogeneous agents
co-existing in same environment, possibly
competitive
7Phone Call Competition Example
- Customer wishes to place long-distance call
- Carriers simultaneously bid, sending proposed
prices - Phone automatically chooses the carrier
(dynamically)
ATT
Sprint
MCI
0.20
0.23
0.18
8Best Bid Wins
- Phone chooses carrier with lowest bid
- Carrier gets amount that it bid
MCI
Sprint
ATT
0.20
0.23
0.18
9Attributes of the Mechanism
- Distributed
- Symmetric
- Stable
- Simple
- Efficient
Carriers have an incentive to invest effort in
strategic behavior
ATT
MCI
Sprint
0.20
0.23
0.18
10Best Bid Wins, Gets Second Price
- Phone chooses carrier with lowest bid
- Carrier gets amount of second-best price
MCI
Sprint
ATT
0.20
0.23
0.18
11Attributes of the Mechanism
- Distributed
- Symmetric
- Stable
- Simple
- Efficient
Carriers have no incentive to invest effort in
strategic behavior
ATT
MCI
Sprint
0.20
0.23
0.18
12Database Domain
Common Database
TOD
13Negotiation
A discussion in which interested parties
exchange information and come to an agreement.
Davis and Smith, 1977
- Two-way exchange of information
- Each party evaluates information from its own
perspective - Final agreement is reached by mutual selection
14Game Theory--Short Introduction
- Game theory is the study of decision making in
multi-person situations where the outcome depends
on everyones choice. - In Decision Theory and the theory of competitive
equilibrium from economics the other participants
actions are considered as an environmental
parameter. The effect of the of the
decision-makers actions on the other
participants is not taken into consideration.
15Describing a Game
- Essential elements players, actions,
information, strategies, payoffs, outcome, and
equilibria. - Ways to present social interactions as a game
- Extensive formthe most complete description.
- Strategic form many details are omitted.
- Coalitional form binding agreements exist.
16Example of two players game
dindia
op
deal
0 2-
1 2
deal
Dsikh
3- 0
2- 1-
blow
17Nash Equilibrium
- An action profile is an order set a(a1,,aN) of
one action for each of the N players in the game. - An action profile a is a Nash Equilibrium (Nash
53) of a strategic game, if each agent j does
not have a different action yielding an outcome
that it prefers to that generated when chooses
aj, given that every other player I chooses ai.
182,1-
blow
3-,5
op
sik
2,5
yes
Ind
2,1-
3,4
op
blow
yes
0.4
sik
Ind
dealH
dealH
c
1,4
0.6
Ind
sik
dealH
Ind
dealH
dealH
1,4
dealH
sik
op
op
4- ,4
-3,0-
19Rules of Encounter
Jeffrey S. Rosenschein Gilad Zlotkin
20Domain Theory
- Task Oriented Domains
- Agents have tasks to achieve
- Task redistribution
- State Oriented Domains
- Goals specify acceptable final states
- Side effects
- Joint plan and schedules
- Worth Oriented Domains
- Function rating states acceptability
- Joint plan, schedules, and goal relaxation
21Postmen Domain
Post Office
TOD
a
?
?
c
b
?
?
f
?
e
d
22Database Domain
Common Database
TOD
23Fax Domain
faxes to send
TOD
a
c
b
Cost is only to establish connection
f
e
d
24Slotted Blocks World
SOD
3
1
2
3
1
2
25The Multi-Agent Tileworld
WOD
hole
agents
tile
B
A
2
2
5
5
2
obstacle
4
3
2
26Task Oriented Domain (TOD)
- A tuple lt T, A, c gt where
- T is the set of all possible tasks
- A A1 , , An is a list of agents
- c is a monotonic function c 2T ? ?
An encounter is a list T1 ,, Tn of finite sets
of tasks from T such that agent Ak needs to
achieve all the tasks in Tk (also called agent
Aks goal).
27Building Blocks
- Domain
- A precise definition of what a goal is
- Agent operations
- Negotiation Protocol
- A definition of a deal
- A definition of utility
- A definition of the conflict deal
- Negotiation Strategy
- In Equilibrium
- Incentive-compatible
28Deal and Utility in two-agent TOD
- Deal ? is a pair (D1, D2) D1 ? D2 T1 ? T2
- Conflict deal ? (T1, T2)
- Utilityi(?) Cost(Ti) Cost(Di)
29Negotiation Protocols
- Agents use a product-maximizing negotiation
protocol (as in Nash bargaining theory) - It should be a symmetric PMM (product maximizing
mechanism) - Examples 1-step protocol, monotonic concession
protocol
30Building Blocks
- Domain
- A precise definition of what a goal is
- Agent operations
- Negotiation Protocol
- A definition of a deal
- A definition of utility
- A definition of the conflict deal
- Negotiation Strategy
- In Equilibrium
- Incentive-compatible
31Negotiation with Incomplete Information
Post Office
?
a
b
h
1
g
c
- What if the agents dont know each others
letters?
f
e
d
?
?
2
1
321 Phase Game Broadcast Tasks
Post Office
?
a
b
h
1
g
c
- Agents will flip a coin to decide who delivers
all the letters.
e
f
d
?
?
2
1
33Hiding Letters
Post Office
?
a
b
h
(1)
(hidden)
g
c
e
f
d
They then agree that agent 2 delivers to f and e.
?
?
2
1
34Another Possibility for Deception
Post Office
a
c
b
?
- They will agree to flip a coin to decide who goes
to b and who goes to c.
?
1, 2
1, 2
35Phantom Letter
Post Office
b, c, d
a
b, c
c
?
b
1, 2
- They agree that agent 1 goes to c.
?
1, 2
?
d
1 (phantom)
36Negotiation over Mixed Deals
- Mixed deal (D1, D2) p
- The agents will perform (D1, D2) with probability
p, and the symmetric deal (D2, D1) with
probability 1 p
Theorem With mixed deals, agents can always
agree on the all-or-nothing deal
37Hiding Letters with MixedAll-or-Nothing Deals
Post Office
?
a
b
h
(1)
(hidden)
g
c
- They will agree on the mixed deal where agent 1
has a 3/8 chance of delivering to f and e.
e
f
d
?
?
2
1
38Phantom Letters with Mixed Deals
Post Office
b, c, d
a
b, c
c
?
b
- They will agree on the mixed deal where A has 3/4
chance of delivering all letters, lowering his
expected utility.
1, 2
?
1, 2
?
d
1 (phantom)
39Sub-Additive TODs
- TOD lt T, A, c gt is sub-additive if for all finite
sets of tasks X, Y in T we have - c(X ? Y) ? c(X) c(Y)
40Sub-Additivity
X
Y
c(X ? Y) ? c(X) c(Y)
41Sub-Additive TODs
- The Postmen Domain, Database Domain, and Fax
Domain are sub-additive.
The Delivery Domain (where postmen dont have
to return to the Post Office) is not sub-additive.
?
?
42Incentive Compatible Mechanisms
a
?
a
b
h
(1)
(hidden)
?
g
c
Sub-Additive
1, 2
?
1, 2
e
f
d
Hidden
Phantom
?
?
?
1
(phantom)
Pure
L
L
2
1
A/N
T/P
T
Mix
L
T/P
Theorem For all encounters in all sub-additive
TODs, when using a PMM over all-or-nothing deals,
no agent has an incentive to hide a task.
43Decoy Tasks
Decoy tasks, however, can be beneficial even with
all-or-nothing deals
Sub-Additive
Hidden
Phantom
Decoy
Pure
L
L
L
A/N
T
T/P
L
Mix
L
T/P
L
44Concave TODs
- TOD lt T, A, c gt is concave if for all finite sets
of tasks Y and Z in T , and X ? Y, we have - c(Y ? Z) c(Y) ? c(X ? Z) c(X)
Concavity implies sub-additivity.
45Concavity
Z
X
Y
- The cost Z adds to X is more than the cost it
adds to Y.(Z - X is a superset of Z - Y)
46Concave TODs
- The Database Domain and Fax Domain are concave
(not the Postmen Domain, unless restricted to
trees).
Z
1
?
This example was not concave Z adds 0 to X, but
adds 2 to its superset Y (all blue nodes).
?
2
?
1
X
1
2
?
?
?
1
1
47Three-Dimensional Incentive Compatible Mechanism
Table
Theorem For all encounters in all concave TODs,
when using a PMM over all-or-nothing deals, no
agent has any incentive to lie.
Concave
Hidden
Phantom
Decoy
Pure
L
L
L
A/N
T
T
T
Mix
L
T
T
Sub-Additive
Hidden
Phantom
Decoy
Pure
L
L
L
A/N
T
T/P
L
Mix
L
T/P
L
48Modular TODs
- TOD lt T, A, c gt is modular if for all finite sets
of tasks X, Y in T we have - c(X ? Y) c(X) c(Y) c(X ? Y)
Modularity implies concavity.
49Modularity
X
Y
- c(X ? Y) c(X) c(Y) c(X ? Y)
50Modular TODs
- The Fax Domain is modular (not the Database
Domain nor the Postmen Domain, unless restricted
to a star topology).
Even in modular TODs, hiding tasks can be
beneficial in general mixed deals.
51Three-Dimensional Incentive Compatible Mechanism
Table
Modular
H
P
D
Pure
L
T
T
Concave
A/N
T
T
T
H
P
D
Mix
L
T
T
Pure
L
L
L
A/N
T
T
T
Sub-Additive
H
P
D
Mix
L
T
T
Pure
L
L
L
A/N
T
T/P
L
Mix
L
T/P
L
52Related Work
- Coalitions Formations Shehory, Sandholm
- Mechanism designEphrati, Kraus, Tennenholtz
- Other models of negotiation Sycara, Durfee,
Lesser, Gasser, Gmytrasiewicz, Jennings - Consensus mechanisms, voting techniques, economic
models Ephrati, Wellman, Sandholm
53Conclusions
- By appropriately adjusting the rules of encounter
by which agents must interact, we can influence
the private strategies that designers build into
their machines - The interaction mechanism should ensure the
efficiency of multi-agent systems
Rules of Encounter
Efficiency
54Conclusions
- To maintain efficiency over time of dynamic
multi-agent systems, the rules must also be
stable - The use of formal tools enables the design of
efficient and stable mechanisms, and the precise
characterization of their properties
Stability
Formal Tools
55Strategic Negotiation
- Collaborators Jon Wilkenfeld, Rina
Schwartz-Azoulay, Orna Shechter, Esti Freitsis
56DAI Overview
- AI
-
- DAI
- DPS MA
- strategic
negotiation
57Strategic Negotiation Model
- Model of alternative offers (Rubinstein) which
takes negotiation time into consideration
reduces negotiation time. - During the strategic-negotiations agents
communicate their respective desires to reach
mutually beneficial agreement. - The model provides a unified to many problems.
58Structure of the Negotiation
- There are N self motivated agents, randomly
designated 1,2,... - All the agents negotiate to reach an agreement.
- The negotiation process may include several
equidistant iterations 0,1,2 ?Time and can
continue forever. In each time period t, agent
j(t) t mod N makes an offer.
59Structure of the Negotiation - cont.
- The other agents respond simultaneously YES4
or NO8 or OPTM. - If the offer was accepted4 by all the agentsthe
last offer is implemented. - If at least one agent opts outM a conflict
occurs. - Otherwise (the offer was rejected8 by at least
one agent), the negotiation proceeds to period
t1. ???
60Applications
- Information servers (large databases).
- Resources sharing.
- Tasks distribution.
- Computer assisted negotiation.
- Union/management negotiation.
61Negotiation on data allocation in multi-server
environment
62Environment Description
- There are several information servers. Each
server is located at a different geographical
area. - Each server receives queries from the clients in
its area, and sends documents as responses to
queries. These documents can be stored locally,
or in another server.
63Environment Description
the query
document/s
distance
serverj
serveri
a query
the document/s
a client
area j
area i
64Environment Description - cont.
- The information is clustered in datasets
(corresponding to file, fragment, etc.) - Each new dataset has to be allocated to one of
the servers by mutual agreement among the
servers. - Each server wants to store the datasets in a
location which reduces its communication and
storage costs. - A negotiation session is initiated when a set of
new datasets arrive.
65Motivation
- Cooperation among servers with similar areas of
interest (e.g., Web servers). - The Data and Information System component of the
Earth Observing System (EOSDIS) of NASAA
distributed knowledge system which supports
archival and distribution of data at multiple and
independent servers.
66Motivation - cont.
- Each data collection, or file, is called a
dataset. The datasets are huge, so each dataset
has only one copy. - The current policy for data allocation in NASA is
static old datasets are not reallocated each
new dataset is located by the server with the
nearest topics (defined according to the topics
of the datasets stored by this server).
67Related Work -File Allocation Problem
- The original problemHow to distribute files
among computers, in order to optimize the system
performance. - Our problemHow can self-motivated servers
decide about distribution of files, when each
server has its own objectives.
68Basic Definitions
- SERVERS the set of the servers.
- DATASETS the set of datasets (files) to be
allocated. - Allocationa mapping of each dataset to one of
theservers. The set of all possible allocation
is denoted by Allocs. - U the utility function of each server.
69The Conflict Allocation
- If at least one server opts outM of the
negotiation, then the conflict allocation
conflict_alloc is implemented. - We consider the conflict allocation to be the
static allocation. (each dataset is stored in the
server with closest topics).
70Utility Function
- Userver(alloc,t) specifies the utility of server
from alloc?Allocs at time t. - It consists of
- The utility from the assignment of each dataset.
- The cost of negotiation delay.
- Userver(alloc,0) S Vserver(x,alloc(x)).
- x?DATASETS
71Parameters of utility
- query price payment for retrieved docoments.
- usage(ds,s) the expected number of documents of
dataset ds from clients in the area of server
s. - storage costs, retrieve costs, answer costs.
72Cost over time
- Cost of communication and computation time of the
negotiation. - Loss of unused information new documents can not
be used until the negotiation ends. - Datasets usage and storage cost are assumed to
decrease over time, with the same discount ratio
(p-1). - Thus, there is a constant discount ratio of the
utility from an allocation Userver(alloc,t)d
tUserver(alloc,0) - tC.
73Assumptions
- Each server prefers any agreement over
continuation of the negotiation indefinitely. - The utility of each server from the conflict
allocation is always greater or equal to 0. - OFFERS - the set of allocations that are
preferred by all the agents over opting out.
74Equilibrium
- Nash equilibriumA strategy profile p is a Nash
Equilibriumif no player has a different strategy
yielding an outcome that he prefers to that
generated when it chooses pi. - Subgame Perfect EquilibriumIf the strategy
profile induced in every subgame is a Nash
Equilibrium of this subgame.
75Negotiation Analysis - Simultaneous Responses
- Simultaneous responsesA server, when
responding, is not informed of the other
responses. - TheoremFor each offer x ? OFFERS, there is a
subgame-perfect equilibrium of the bargaining
game, with the outcome x offered and unanimously
accepted in period 0.
76Choosing the Allocation
- The designers of the servers can agree in advance
on a joint technique for choosing x - giving each server its conflict utility.
- maximizing a social welfare criterion
- the sum of the servers utilities.
- or the generalized Nash product of the servers
utilities P (Us(x)-Us(conflict)).
77Choosing the Allocation - cont.
- The problem of finding an optimal allocation is
NP-complete (a reduction from the multiprocessors
scheduling). - When finding x is intractable, we suggest the
following protocol - each server will search for an allocation
- the allocation which maximizes the predefined
social welfare criterion will be chosen.
78Search Methods
- We have implemented the following algorithms
- A backtracking algorithmSearching the search
space of the allocation problem. - A random restart hill-climbing algorithmStarts
with a random allocation and tries to improve it. - A genetic algorithmSearching by simulating an
evolution process. Each individual represents an
allocation. The algorithm involves reproduction,
crossover and mutation of individuals.
79Experimental Evaluation
- How do the parameters influence the results of
the negotiation? - vcost(alloc) the variable costs due to an
allocation (excludes storage_cost and the gains
due to queries). - vcost_ratio the ratio of vcosts when using
negotiation, and vcosts of the static allocation.
80Effect of Parameters on The Results
- As the number of servers grows, vcost_ratio
increases (more complex computations) L. - As the number of datasets grows, vcost_ratio
decreases (negotiation is more beneficial) J. - Changing the mean usage did not influence
vcost_ratio significantlyK, but vcost_ratio
decreases as the standard deviation of the usage
increasesJ.
81Influence of Parameters - cont.
- When the standard deviation of the distances
between servers increases, vcost_ratio
decreasesJ. - When the distance between servers increases,
vcost_ratio decreasesJ. - In the domains tested,
- answer_cost ? vcost_ratio ? L.
- storage_cost ? vcost_ratio ? L.
- retrieve_cost ? vcost_ratio ? J.
- query_price ? vcost_ratio ? J.
82Social Criteria
- We studied the effect of the choice of the social
welfare criterion on the results. - We compare the following criteria
- Sum of agents utilities.
- Product of agents utilities.
- Maximizing the sum achieves lower vcost_ratio.
- Maximizing the product achieves lower dispersion
of the agents utilities.
83Incomplete Information
- Each server knows
- The usage frequency of all datasets, by clients
from its area. - The usage frequency of datasets stored in it, by
all clients.
84Incomplete Information - cont.
- A revelation mechanism
- First, all the servers report simultaneously all
their private information - for each dataset, the past usage of the dataset
by this server. - for each server, the past usage of each local
dataset by this server. - Then, the negotiation proceeds as in the complete
information case.
85Incomplete Information - cont.
- LemmaThere is a Nash equilibrium where each
server tells the truth about its past usage of
remote datasets, and the other servers usage of
its local datasets. - Lies concerning details about local usage of
local datasets are intractable.
86Summary negotiation on data allocation
- We have considered the data allocation problem in
a distributed environment. - We have presented the utility function of the
servers, which expresses their preferences. - We have proposed using a negotiation protocol for
solving the problem. - For incomplete information situations, a
revelation process was added to the protocol.
87Negotiations in the pollution sharing problem
- Collaborator Esti Freitsis
88Environment Description
- There are some closely grouped plants in an
industrial region. - Each plant can produce several types of products.
- Each plant has a utility function (profit).
- There are several types of pollution substances.
- Each plant has norms, restricting maximal
emission of each polluting substance that it
emits. The pollution always has to be below these
norms. We refer to the situation when only these
norms have to be carried out as usual
circumstances.
89Special circumstances
- Sometimes there is a need to reduce pollution for
some period because of external factors such as
weather (high humidity, wind towards residential
area). In this case plants receive new norms. We
refer to this situation as special circumstances.
90Current solution
- Current solution each plant reduce pollution
according to the new norms. - Disadvantage for one plant it is less costly to
reduce one substance while for another it is less
costly to reduce another substance.
91Negotiations
- Our solution plants negotiate to reach
beneficial agreements about the emission of what
substances and by which percent each of them must
be reduced. - The conflict solution following the new norms.
- We consider complete information situations.
92Negotiations Protocols
- Simultaneous responsesan agent responding to an
offer is not informed of the other responses. - Sequential responses an agent responding to an
offer is informed of the responses of the
preceding agents (assuming that the agents are
ordered).
93Negotiations strategies for simultaneous responses
- As in the data allocation case
- For each possible agreement x that is better to
all the plants than the conflict solution there
is a subgame-perfect equilibrium of the
bargaining game, with the outcome x offered and
unanimously accepted in period 0.
94Negotiations strategies for sequential responses
- Assumption there is a time period, T where
negotiation cannot continue anymore. In T the
conflict allocation is implemented. - Perfect equilibrium by backward induction
- At T-1 if negotiations hasnt ended, AT-1
suggests the best agreement to itself which is
better to all agents than the conflict solution
(denoted by OT-1 ) the other agents accept. - At T-2, AT-2 suggests the best agreement to
itself which is better to all agents than the
conflict solution and OT-1 (denoted by OT-2).
The other agents accept. - By induction, at the first time period A0 O0 the
others accept.
95 Assumptions about the environment
- Profit is a linear function of the number of
items of each product produced by the plant - Pollution is a linear function of the number of
items of each product produced.
96Techniques which were checked
- Strategic negotiations
- Sequential responses backtracking
- Simultaneous response Maximization of the sum
with guaranties of default profit - Simplex method - method for linear optimization
- Nash Product
- Praxis - method for multi-variable nonlinear
function minimization. - Hill Climbing
97Simulation Parameters
- Number of plants is varied from 5 to 20.
- Number of pollution types is varied from 5 to 20.
For each product pollution of some type is
produced with probability 1/2. - Each plant produces Max_prod different types of
products. Max_prod is varied from 5 to 20.
Pollution and profit per item of product and
pollution constraints are set randomly. - Results Average of 25 simulation runs.
98Plants utility as the function of the number of
plants
99Standard Deviation as the function of the number
of plants
100Computation time as a function of number of plants
101Plants utility as the function of the number of
pollution substances
102Standard deviation as the function of the number
of pollution substances
103Computation time as a function of the number of
pollution substances
104Plants utility as a function of the number of
products
105Standard deviation as a function of the number
of products
106Computation time as the function of the number of
products
107Computation time as a function of the number of
products
108Conclusions
- Maximizing the sum yields the highest average
utility, but also the highest standard deviation
requires agreement between the designers on
selecting a solution. - Backward induction yields a reasonable average
utility with low standard deviations and no need
for designers agreement on detailed protocol. - On going work incomplete information.
109Sharing Resources Through Negotiation
- Joint resource public communication system
satellite - Agents self motivated.
- Environment no central controller.
110Environment Description
- Two agents must share a joint resource the
resource can only be used by one agent at a time.
No central controller. - One agent (A) is using the resource, and the
second (W) wants to use it too. - The agents negotiate to reach an agreement a
schedule that divides the usage of the resource
lts,tgt.
111Environment Description -cont
- A continues to use the resource as the
negotiation proceeds A gains over time. - W is not able to use the resource W loses over
time. - Opting out causes damage to the resourceboth
agents wait q time steps. - Additional option an agent can leave the
negotiation.
112Applying the strategic model
- We developed a detailed utility function for the
agents (U_A U_W). Parameters type of goal,
dead-lines, costs of negotiation, gains from
goal, etc. - Main factor in the negotiation the best
agreement for A, which is still better for W than
Opting out (O_n).
113Perfect equilibrium strategies
- O_n depends on the specific situation we proved
lemmas which specify the value of O_n as a
function of the utility function parameters. - Complete information Negotiation ends at most
after one step with an agreement, or W leaves. - The strategies are simple.
114Experiments Using MINUET
Agent 2
Agent 1
Send request lt5,3gt
Working on goal 102
Receive request lt5,3gt
Resources
1001 - free 1002 - busy
115Experiments Results
Nego.
EDF
Metric
Utility score
91
91
Abandon goals
9.6
8.4
21.2
Nego./Alter.
15.5
116Summary
- A strategic model of negotiation, taking the
passage of time into account. - We consider wide range of situationscomplete
/incomplete informationNgt2 agentsagents lose
over time/some lose and some gain over time
117Summary--cont.
- The model was applied to different domains.
- We found simple and stable strategies.
- Negotiation ends without delay.