Title: Joint Strategy Fictitious Play
1Joint Strategy Fictitious Play
2Adapted from
- J. R. Marden, G. Arslan, J. S. Shamma, Joint
strategy fictitious play with inertia for
potential games, in Proceedings of the 44th IEEE
Conference on Decision and Control, December
2005, pp. 6692-6697.
3Review Game
4Review Game
- We then play the game repeatedly in stages,
starting at stage 0. Players can use learning
algorithms as discussed in lecture. Note that
players know the structural form of their own
payoff function, but do not know the form of the
other players payoff functions.
5Notation Actions
- As in the lecture, we use the notation
6Review Regret Matching
- Guaranteed to converge to a Coarse Correlated
Equilibrium (CCE) in all games (Hart
Mas-Colell, 2000). - But CCE can be quite bad in some cases, as they
are a superset of Nash Equilibria (NE).
7Review Fictitious Play (FP)
- Observe empirical frequencies of every players
action - Consider best response(s) under the (incorrect)
assumption that other players play according to
their empirical frequencies - Randomly choose a best response and act
accordingly
8Empirical Frequency in FP
- The empirical frequency for a player and an
action is the percentage of stages that the
player chose that action up to the previous stage
9Empirical Frequency in FP
- Each player also has an empirical frequency
vector.
10Best Response in FP
- Each player assumes an expected payoff
- And each player chooses a best response from the
set
11The Good News!
- The empirical frequencies generated by FP
converge to a Nash equilibrium in potential
games (Monderer Shapley, 1996).
12The Bad News (if any)?
- What are some weaknesses of FP?
13A Routing Example
- Consider a routing game with 100 players all with
the same source and sink - There are 4 roads from the source to the sink
- Players want to minimize their cost.
14A Routing Example
- The cost of traveling on each road is given by a
quadratic cost function with positive
coefficients (could be randomly generated)
depending on the number of players choosing that
road - Can we use FP as a learning algorithm in this
example?
15A Routing Example
- Formalizing the game, we have
16A Routing Example
17A Routing Example
The sum above is over 4992198 terms!
18A Routing Example
- Remember this?
- This is not computationally feasible!
The sum above is over 4992198 terms!
19What do we do?
- The routing example (which is fairly realistic)
is motivation that we either need to find a more
effective way to compute this utility or we need
to develop an algorithm that is computationally
suitable for large games.
20Joint Strategy Fictitious Play (JSFP)
- Observe empirical frequencies of joint actions
- Consider best response(s) under the (still
incorrect) assumption that all other players act
collectively as a group according to their joint
empirical frequency - Randomly choose a best response and act
accordingly
21Does FPJSFP?
- In the case of two players it is easy to see that
FP and JSFP are the same.
22Does FPJSFP?
- In the case of two players it is easy to see that
FP and JSFP are the same - But in the case of three or more players this is
not necessarily the case!
23Empirical Frequency in JSFP
- The empirical frequency for an action profile may
be calculated as follows
24Expected Payoff in JSFP
- Each player assumes an expected payoff
25Expected Payoff in JSFP
- Each player assumes an expected payoff
- But this looks about as bad (maybe worse) than
FP! - So what can we do?
26Expected Payoff in JSFP
- Each player assumes an expected payoff
- We rewrite it in a more useful form!
27The JSFP Payoff Recursion
- So now, we can rewrite the expected payoff as a
simple recursion, and at every stage choose a
value that maximizes it (our best response) - We are maximizing regret!
28Convergence Properties of JSFP
- The convergence properties of JSFP (for games of
three or more players) remain unknown so this is
an open problem. But when a joint action
generated by JSFP reaches a strict NE, it will
stay there forever. To get convergence
properties, we add inertia to our learning
algorithm.
29JSFP with Inertia
- Assume that all NE are strict
- JSFP-1 If the action chosen by a player in the
previous stage is a best response to the current
stage choose that action - JSFP-2 Otherwise choose an action according to
the distribution
30The JSFP-2 Distribution
- Here the alpha parameter represents the players
willingness to optimize at a given stage, while
the beta parameter whose support is contained in
the set of best responses to this stage, and the
v term is a distribution with full support on the
action taken in the previous stage.
31JSFP w/ Inertia Converges!
- In particular to some Nash Equilibria for
generalized ordinal potential games - Of course there is no equilibrium selection
mechanism - And not much is known regarding the convergence
rate - But we have shown that JSFP w/ Inertia is a good
substitute for FP in large games
32JSFP w/ Inertia Converges!
- If you want the proof, read the paper as the
proof is not trivial!
33The Fading Memory Variant
- We used the recursion
- But we could also use the recursion
- Here, rho is a constant or function less than or
equal to 1, and it is also proven that this
algorithm gives rise to a process converging to
some NE.
34A Routing Example, Revisited
- We can now apply JSFP w/ Inertia and fading
memory to the routing problem, and we should
converge to some NE (in generalized ordinal
potential games, which includes routing games) - Simulations show that JSFP without inertia should
also work in this case - Try it!
35Example of Convergence
36Conclusion
- We have demonstrated some weaknesses of FP
(computational demands, observational demands,
etc.) - We have developed JSFP, which seems to
accommodate computational limitations