Joint Strategy Fictitious Play - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Joint Strategy Fictitious Play

Description:

Joint Strategy Fictitious Play Sherwin Doroudi Adapted from J. R. Marden, G. Arslan, J. S. Shamma, Joint strategy fictitious play with inertia for potential ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 37
Provided by: Sherwin6
Category:

less

Transcript and Presenter's Notes

Title: Joint Strategy Fictitious Play


1
Joint Strategy Fictitious Play
  • Sherwin Doroudi

2
Adapted from
  • J. R. Marden, G. Arslan, J. S. Shamma, Joint
    strategy fictitious play with inertia for
    potential games, in Proceedings of the 44th IEEE
    Conference on Decision and Control, December
    2005, pp. 6692-6697.

3
Review Game
  • Players
  • Actions
  • Payoffs

4
Review Game
  • We then play the game repeatedly in stages,
    starting at stage 0. Players can use learning
    algorithms as discussed in lecture. Note that
    players know the structural form of their own
    payoff function, but do not know the form of the
    other players payoff functions.

5
Notation Actions
  • As in the lecture, we use the notation

6
Review Regret Matching
  • Guaranteed to converge to a Coarse Correlated
    Equilibrium (CCE) in all games (Hart
    Mas-Colell, 2000).
  • But CCE can be quite bad in some cases, as they
    are a superset of Nash Equilibria (NE).

7
Review Fictitious Play (FP)
  • Observe empirical frequencies of every players
    action
  • Consider best response(s) under the (incorrect)
    assumption that other players play according to
    their empirical frequencies
  • Randomly choose a best response and act
    accordingly

8
Empirical Frequency in FP
  • The empirical frequency for a player and an
    action is the percentage of stages that the
    player chose that action up to the previous stage

9
Empirical Frequency in FP
  • Each player also has an empirical frequency
    vector.

10
Best Response in FP
  • Each player assumes an expected payoff
  • And each player chooses a best response from the
    set

11
The Good News!
  • The empirical frequencies generated by FP
    converge to a Nash equilibrium in potential
    games (Monderer Shapley, 1996).

12
The Bad News (if any)?
  • What are some weaknesses of FP?

13
A Routing Example
  • Consider a routing game with 100 players all with
    the same source and sink
  • There are 4 roads from the source to the sink
  • Players want to minimize their cost.

14
A Routing Example
  • The cost of traveling on each road is given by a
    quadratic cost function with positive
    coefficients (could be randomly generated)
    depending on the number of players choosing that
    road
  • Can we use FP as a learning algorithm in this
    example?

15
A Routing Example
  • Formalizing the game, we have

16
A Routing Example
  • Remember this?

17
A Routing Example
  • Remember this?

The sum above is over 4992198 terms!
18
A Routing Example
  • Remember this?
  • This is not computationally feasible!

The sum above is over 4992198 terms!
19
What do we do?
  • The routing example (which is fairly realistic)
    is motivation that we either need to find a more
    effective way to compute this utility or we need
    to develop an algorithm that is computationally
    suitable for large games.

20
Joint Strategy Fictitious Play (JSFP)
  • Observe empirical frequencies of joint actions
  • Consider best response(s) under the (still
    incorrect) assumption that all other players act
    collectively as a group according to their joint
    empirical frequency
  • Randomly choose a best response and act
    accordingly

21
Does FPJSFP?
  • In the case of two players it is easy to see that
    FP and JSFP are the same.

22
Does FPJSFP?
  • In the case of two players it is easy to see that
    FP and JSFP are the same
  • But in the case of three or more players this is
    not necessarily the case!

23
Empirical Frequency in JSFP
  • The empirical frequency for an action profile may
    be calculated as follows

24
Expected Payoff in JSFP
  • Each player assumes an expected payoff

25
Expected Payoff in JSFP
  • Each player assumes an expected payoff
  • But this looks about as bad (maybe worse) than
    FP!
  • So what can we do?

26
Expected Payoff in JSFP
  • Each player assumes an expected payoff
  • We rewrite it in a more useful form!

27
The JSFP Payoff Recursion
  • So now, we can rewrite the expected payoff as a
    simple recursion, and at every stage choose a
    value that maximizes it (our best response)
  • We are maximizing regret!

28
Convergence Properties of JSFP
  • The convergence properties of JSFP (for games of
    three or more players) remain unknown so this is
    an open problem. But when a joint action
    generated by JSFP reaches a strict NE, it will
    stay there forever. To get convergence
    properties, we add inertia to our learning
    algorithm.

29
JSFP with Inertia
  • Assume that all NE are strict
  • JSFP-1 If the action chosen by a player in the
    previous stage is a best response to the current
    stage choose that action
  • JSFP-2 Otherwise choose an action according to
    the distribution

30
The JSFP-2 Distribution
  • Here the alpha parameter represents the players
    willingness to optimize at a given stage, while
    the beta parameter whose support is contained in
    the set of best responses to this stage, and the
    v term is a distribution with full support on the
    action taken in the previous stage.

31
JSFP w/ Inertia Converges!
  • In particular to some Nash Equilibria for
    generalized ordinal potential games
  • Of course there is no equilibrium selection
    mechanism
  • And not much is known regarding the convergence
    rate
  • But we have shown that JSFP w/ Inertia is a good
    substitute for FP in large games

32
JSFP w/ Inertia Converges!
  • If you want the proof, read the paper as the
    proof is not trivial!

33
The Fading Memory Variant
  • We used the recursion
  • But we could also use the recursion
  • Here, rho is a constant or function less than or
    equal to 1, and it is also proven that this
    algorithm gives rise to a process converging to
    some NE.

34
A Routing Example, Revisited
  • We can now apply JSFP w/ Inertia and fading
    memory to the routing problem, and we should
    converge to some NE (in generalized ordinal
    potential games, which includes routing games)
  • Simulations show that JSFP without inertia should
    also work in this case
  • Try it!

35
Example of Convergence
36
Conclusion
  • We have demonstrated some weaknesses of FP
    (computational demands, observational demands,
    etc.)
  • We have developed JSFP, which seems to
    accommodate computational limitations
Write a Comment
User Comments (0)
About PowerShow.com