Title: Using Potential Games to Design Distributed Optimisation Systems
1Using Potential Games to Design Distributed
Optimisation Systems
- Archie Chapman
- Intelligence, Agents, Multimedia
- Electronics Computer Science
2Motivation
- Interested in building distributed optimisation
techniques, using agents - No need to collect the information at a central
point - Fault tolerant, robust to failures
- Reduced communication requirements
- Flexible response in dynamic environments
- Use game theory to design and analyse
self-organising agent systems
3Overview
- Game theory
- Nash equilibrium
- Correlated equilibrium
- Potential games
- Existence of Nash and correlated equilibria
- Convergence of simple adaptive procedures
- Potential games for distributed optimisation
- Alignment of global private utilities
- Example Graph colouring problem
4A very short introduction to game theory (i)
- A game is an interaction between two or more
self-interested agents - Each agent chooses from a set of strategies, Si
- A (joint) strategy profile, s, is the set of
chosen strategies, also called an outcome of the
game - Each agent has a utility function, ui(s),
specifying their preference for each outcome in
terms of a payoff
5A very short introduction to game theory (ii)
- An agents best response is the strategy with the
highest payoff, given its opponents choice of
strategy - A Nash equilibrium is a strategy profile such
that every agents strategy is a best response to
others choice of strategy - No agent has an incentive to change strategy,
given that everyone else plays the equilibrium
strategy - A correlated equilibrium is a commonly known
probability distribution over correlated signals
recommending a joint strategy profile - No agent has an incentive to go against their
recommendation, given that everyone else follows
their recommendation
6Example Stag Hunt game
7Nash equilibrium
2 pure-strategy Nash equilibria cooperate,
cooperate defect, defect
8Correlated equilibrium
Probability distribution over signals
Assume row player is told by the signal to
cooperate
Each agent receives a signal recommending only
its own strategy (e.g. cooperate), but knows
the joint probability distribution over others
recommendations
Then the probability that the column player has
been told to cooperate is ½ (1/2 1/6) ¾
The expected payoff for following the
recommendation is ¾ (4) ¼ (0) 3 and for
not following the recommendation is ¾ (3) ¼
(2) 2 ¾
9Correlated equilibrium
Now assume row player is told by the signal to
defect
Then the probability that the column player has
been told to cooperate is 1/6(1/6 1/6) 1/2
The expected payoff for following the
recommendation is ½ (3) ½ (2) 2 ½ and for
not following the recommendation is ½ (4) ½
(0) 2
10Correlated equilibrium
The set of correlated equilibria contain the set
of Nash equilibria
See that the Nash equilibria of the Stag hunt
game are correlated equilibria with the following
signal probabilities
cooperate, cooperate
defect, defect
11Distributed optimisation design questions
- How do we ensure that the equilibrium outcome is
desirable/optimal? That is, how do we ensure
that the equilibrium corresponds to a good global
utility? - How do we ensure that the agents converge to an
equilibrium? What if there are many equilibria
in the game? Which one will emerge? - Often these problems can be addressed using
results from the class of potential games
12Potential Games
- An ordinal potential function, P(si,s-i), for a
game is a function such that - ui(si,s-i) - ui(si,s-i) gt 0 n P(si,s-i) -
P(si,s-i) gt 0, - i.e. the sign of the change in private utility to
a unilaterally deviating player is matched by the
sign of the change in the potential function - A game that admits a potential is called a
potential game
13Example Stag Hunt game
Stag hunt Potential game
function
14Example Stag Hunt game
Stag hunt Potential game
function Start in the defecting
equilibrium
And see what happens to the potential function if
the column player changes strategy
15Example Stag Hunt game
Stag hunt Potential game
function Change in the deviators utility
is matched by a change in the value of the
potential function
0 - 2 0 - 2
16Example Stag Hunt game
Stag hunt Potential game
function Now, if the row player were to
respond
by moving to the cooperative equilibrium
17Example Stag Hunt game
4 - 3 1 - 0
Stag hunt Potential game
function The change in the row players
utility is matched by a change in the potential
function
18Potential games naturally arise as
- Network congestion games
- Oligopoly models, games of strategic compliments
or substitutes - Coalition formation problems
- Organisation of the firm
- Principal-agent games
19Potential games have been used for distributed
optimisation in
- Automatic radio channel selection
- Power control problems in wireless networks
- Scheduling problems in communication networks
- Autonomous vehicle target tracking
20Properties of potential games
- All potential games contain at least one pure
strategy Nash equilibrium - Corollary every local maxima of the potential
function is a Nash equilibrium - Potential games in which (i) Si are convex and
compact, and (ii) have a smooth, strictly concave
potential have a unique pure strategy correlated
equilibrium (which is also a unique Nash
equilibrium) - All potential games have the finite improvement
property a convergence property for learning
processes in repeated games
21Learning in repeated games
- Typically refers to simple adaptive procedures
played in a repeated game - No known efficient learning procedures that
converge to Nash Eq for all classes of games, but
Best Response and Fictitious Play converge to
Nash Eq in potential games (finite improvement
property) - Regret Matching converges to the set of
correlated equilibria in all finite games
22Best Response and Fictitious Play
- Both choose strategy to maximise expected revenue
- Best response
- Max expected revenue with beliefs given by last
rounds play - sit1 argmaxsi ui(si,s-it)
-
- Fictitious Play
- Max expected revenue with beliefs given by
empirical frequency of play -
- qit(s-i) observed frequency of opponents
profile s-i - sit1 argmaxsi Ss-i qit(s-i)ui(si,s-i)
23Regret Matching
- Regret the difference between the payoff an
agent would have received if it chose si and the
payoff it actually received ui(si,s-it) -
ui(st) - Average regret for not selecting si in every
subsequent period - Rt1(si) max 1/t St ui(si,s-it) - ui(st) ,
0 - Strategies are selected in proportion to their
average level of regret, e.g - Pr(si) Rt1(si) / Ssi Rt1(si)
24Potential games for distributed optimisation
- Take an optimisation problem, and assign each
variable to a separate agent - Construct the agents utility functions to align
them with the global objectives, i.e. align the
equilibria of the game with the optimal global
state - This will result in a potential game, so by
allowing the agents to adjust using some learning
procedure, the system should converge to
equilibrium/optimal state - How do we align the agents private utilities with
the global goals...?
25Aligning agents utilities with global goals
- Agents private utility functions ui(si,s-i)
- Systems global utility function ug(si,s-i)
- Aligned utilities
- ui(si,s-i) - ui(si,s-i) gt 0 n ug(si,s-i) -
ug(si,s-i) gt 0 - Construct agents utility functions such that any
change in strategy has the same effect on the
global utility as it does on the agents utility
26Aligning agents utilities with global goals
- Strategic situations where agents utilities are
aligned to global utilities are, by definition,
potential games, where the global utility
function is a potential function for the game - Typically, to align private utilities, use an
agents marginal contribution to the global
utility as its private utility function (there
are other methods, but not today!)
27Simple example of aligned utilities Graph
colouring problem
The global utility is maximised by minimising the
number of conflicts ug - (total number of
neighbouring nodes with same colour)
So align the private utilities by setting ui
- (number of is neighbours with same colour)
28Graph colouring as a potential game
Pair-wise interaction between agents
i.e. each agent plays this game with each of its
neighbours.
Potential function
Each agents full utility function, and the full
potential function are given by aggregation these
pair-wise functions
29Graph colouring as a potential game
- Now, simple learning dynamics will converge to a
Nash equilibrium of the game - Example graph
30Graph colouring as a potential game
Average time to converge Best response 563
steps / 48 cycles Weighted F-play 560 steps /
47 cycles Weighted R-match 686 steps / 57 cycles
31Future work
- Investigate which types of problems can be
decomposed so as to align private and global
utilities, generating potential games - Find adaptive procedures that efficiently
converge to preferred outcomes - Examine the effects of dynamic environments on
outcomes - Examine the effects of network structure on
convergence
32Thank you