Joint Strategy Fictitious Play - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Joint Strategy Fictitious Play

Description:

Joint Strategy Fictitious Play Sherwin Doroudi Adapted from J. R. Marden, G. Arslan, J. S. Shamma, Joint strategy fictitious play with inertia for potential ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 37

Provided by: Sherwin6

Category:

more less

Transcript and Presenter's Notes

Title: Joint Strategy Fictitious Play

1
Joint Strategy Fictitious Play

Sherwin Doroudi

2
Adapted from

J. R. Marden, G. Arslan, J. S. Shamma, Joint
strategy fictitious play with inertia for
potential games, in Proceedings of the 44th IEEE
Conference on Decision and Control, December
2005, pp. 6692-6697.

3
Review Game

Players
Actions
Payoffs

4
Review Game

We then play the game repeatedly in stages,
starting at stage 0. Players can use learning
algorithms as discussed in lecture. Note that
players know the structural form of their own
payoff function, but do not know the form of the
other players payoff functions.

5
Notation Actions

As in the lecture, we use the notation

6
Review Regret Matching

Guaranteed to converge to a Coarse Correlated
Equilibrium (CCE) in all games (Hart
Mas-Colell, 2000).
But CCE can be quite bad in some cases, as they
are a superset of Nash Equilibria (NE).

7
Review Fictitious Play (FP)

Observe empirical frequencies of every players
action
Consider best response(s) under the (incorrect)
assumption that other players play according to
their empirical frequencies
Randomly choose a best response and act
accordingly

8
Empirical Frequency in FP

The empirical frequency for a player and an
action is the percentage of stages that the
player chose that action up to the previous stage

9
Empirical Frequency in FP

Each player also has an empirical frequency
vector.

10
Best Response in FP

Each player assumes an expected payoff
And each player chooses a best response from the
set

11
The Good News!

The empirical frequencies generated by FP
converge to a Nash equilibrium in potential
games (Monderer Shapley, 1996).

12
The Bad News (if any)?

What are some weaknesses of FP?

13
A Routing Example

Consider a routing game with 100 players all with
the same source and sink
There are 4 roads from the source to the sink
Players want to minimize their cost.

14
A Routing Example

The cost of traveling on each road is given by a
quadratic cost function with positive
coefficients (could be randomly generated)
depending on the number of players choosing that
road
Can we use FP as a learning algorithm in this
example?

15
A Routing Example

Formalizing the game, we have

16
A Routing Example

Remember this?

17
A Routing Example

Remember this?

The sum above is over 4992198 terms!
18
A Routing Example

Remember this?
This is not computationally feasible!

The sum above is over 4992198 terms!
19
What do we do?

The routing example (which is fairly realistic)
is motivation that we either need to find a more
effective way to compute this utility or we need
to develop an algorithm that is computationally
suitable for large games.

20
Joint Strategy Fictitious Play (JSFP)

Observe empirical frequencies of joint actions
Consider best response(s) under the (still
incorrect) assumption that all other players act
collectively as a group according to their joint
empirical frequency
Randomly choose a best response and act
accordingly

21
Does FPJSFP?

In the case of two players it is easy to see that
FP and JSFP are the same.

22
Does FPJSFP?

In the case of two players it is easy to see that
FP and JSFP are the same
But in the case of three or more players this is
not necessarily the case!

23
Empirical Frequency in JSFP

The empirical frequency for an action profile may
be calculated as follows

24
Expected Payoff in JSFP

Each player assumes an expected payoff

25
Expected Payoff in JSFP

Each player assumes an expected payoff
But this looks about as bad (maybe worse) than
FP!
So what can we do?

26
Expected Payoff in JSFP

Each player assumes an expected payoff
We rewrite it in a more useful form!

27
The JSFP Payoff Recursion

So now, we can rewrite the expected payoff as a
simple recursion, and at every stage choose a
value that maximizes it (our best response)
We are maximizing regret!

28
Convergence Properties of JSFP

The convergence properties of JSFP (for games of
three or more players) remain unknown so this is
an open problem. But when a joint action
generated by JSFP reaches a strict NE, it will
stay there forever. To get convergence
properties, we add inertia to our learning
algorithm.

29
JSFP with Inertia

Assume that all NE are strict
JSFP-1 If the action chosen by a player in the
previous stage is a best response to the current
stage choose that action
JSFP-2 Otherwise choose an action according to
the distribution

30
The JSFP-2 Distribution

Here the alpha parameter represents the players
willingness to optimize at a given stage, while
the beta parameter whose support is contained in
the set of best responses to this stage, and the
v term is a distribution with full support on the
action taken in the previous stage.

31
JSFP w/ Inertia Converges!

In particular to some Nash Equilibria for
generalized ordinal potential games
Of course there is no equilibrium selection
mechanism
And not much is known regarding the convergence
rate
But we have shown that JSFP w/ Inertia is a good
substitute for FP in large games

32
JSFP w/ Inertia Converges!

If you want the proof, read the paper as the
proof is not trivial!

33
The Fading Memory Variant

We used the recursion
But we could also use the recursion
Here, rho is a constant or function less than or
equal to 1, and it is also proven that this
algorithm gives rise to a process converging to
some NE.

34
A Routing Example, Revisited

We can now apply JSFP w/ Inertia and fading
memory to the routing problem, and we should
converge to some NE (in generalized ordinal
potential games, which includes routing games)
Simulations show that JSFP without inertia should
also work in this case
Try it!

35
Example of Convergence
36
Conclusion

We have demonstrated some weaknesses of FP
(computational demands, observational demands,
etc.)
We have developed JSFP, which seems to
accommodate computational limitations

Write a Comment

User Comments (0)