Title: Nash Equilibria and Reachability Games
1Nash Equilibria andReachability Games
- Rupak Majumdar
- University of California, Los Angeles
2Systems and Models
Calculate
Model
Mathematics
Predict Analyze Model
Abstract Build Model
Aircraft
System
Test
3(Qualitative) Systems Theory
- Trajectory dynamic evolution of state
sequence of states - Model generates a set of trajectories
transition graph - Property assigns boolean values to
trajectories logical formula - Algorithm compute values of the trajectories
generated by a model
red and green alternate
4Model Colored Transition Graphs
a
c
b
5Property Eventually red
a
c
b
On graphs ? ?red some trajectory has the
property ?red
6For qualitative properties over discrete systems,
there is a beautiful, robust theory Buchi,
Rabin, Emerson, Pnueli et al.
The ?-Regular Properties
-logical characterization (S1S second-order
monadic theory) -modal characterization
(LTL first-order fragment)
-nondeterministic characterization (Buchi
automata) -deterministic
characterization (Rabin automata)
-topological characterization (2.5
Borel levels of Cantor topology) -fixpoint
characterization (?-calculus) -effectively closed
under boolean operations
-decidable (S1S
nonelementary, Buchi linear)
7Richer Models Games
FAIRNESS ?-automaton
Parity game
graph
ADVERSARIAL CONCURRENCY game graph
- for compositional modeling of systems
- for computing winning strategies (control)
8- Two players
- Finite set of states S
- Finite set of actions S
- Action assignments ?1,?2S! 2?n
- Deterministic transition function d(s, a1, a2) t
1,1 1,2
1,1 1,2 2,2
2,1 2,2
a
c
b
2,1
On games ltltleftgtgt ?red player "left" has a
strategy to enforce ?red
9Strategies
- Deterministic Strategies
- Functions from histories S to enabled moves
- Given a play s0s1 sk,
- strategy ?i(s0s1...sk) a for some a 2 ?i(sk)
10Winning Conditions
- Outcome Sequence of states
- Winning Condition
- Language ? over outcomes
- Player 1s objective
- Ensure that the outcome is a member of ?
- no matter what player 2 does
11Fundamental Questions
- Fundamental Property Determinacy
- Set of states can be partitioned into states
where player 1 wins and states where player 2
wins - Fundamental algorithmic question
- Given a deterministic turn based game and a
winning - condition, find the set of states from which
player 1 - can win. Also find a (deterministic) winning
strategy.
12One-Step Game
- Regions are sets of states
- Let U be a set
- From where can we reach U surely in one step?
- CPre1(U)
- s9 a2?1(s). 8 b 2?2(s). ?(s,a,b)2 U
- CPre1 is a transformer on regions
- Similarly, we can define CPre2 for player 2
13Multistep Reachability
- Winning condition Can player 1 eventually reach
P? - This is a least fixpoint
- ? x. P Ç CPre1(x)
P
.
CPre(P)
CPre2(P)
14Multistep Reachability
- The proof is not yet complete.
- To finish the proof we must show we cannot win
from the complement
P
.
?
CPre(P)
CPre2(P)
15More Objectives
- ?-regular objectives
- Buchi Landweber69, Gurevich Harrington 82,
Emerson Jutla91 Every two-player game with
?-regular winning conditions is determined. - EmersonJutla91 Winning states for parity
objectives can be computed in NP Å coNP - Borel objectives
- Martin 75 Every two-player game with Borel
winning conditions is determined.
16Quantitative Systems Theory
- Trajectory dynamic evolution of state
sequence of states - Model generates a set of trajectories game
graph - Property assigns real values to trajectories
quantitative logical formula - Algorithm compute real values of the
trajectories generated by a model
what fraction of paths see red nodes?
17Models with Probability
FAIRNESS ?-automaton
Parity game
ADVERSARIAL CONCURRENCY game graph
graph
Stochastic games
PROBABILITIES Markov Decision
Processes
18Concurrent Games
- Two players
- Finite set of states S
- Finite set of actions S
- Action assignments ?1,?2S! 2?n
- Probabilistic transition function
- d(s, a1, a2)(t) Pr t s, a1, a2
19Concurrent Games
a
c
b
right
right
1
2
1
2
left
left
a 0.6 b 0.4
a 0.5 b 0.5
a 0.0 c 1.0
a 0.0 c 1.0
1
1
a 0.1 b 0.9
a 0.2 b 0.8
a 0.7 b 0.3
a 0.0 b 1.0
2
2
Maximal probability with which player "left" can
enforce ?red against all randomized strategies of
player right ?
20Overview of Types of Games
Deterministic
Probabilistic
Tic-tac-toe, Control of ?-automata
Control of probabilistic I/O automata
Turn based
Matching pennies, rock- Paper, scissors, Control
of synchronous components
Stochastic games Control of general Competitive
Markov Processes
Concurrent
21Overview of Types of Games
Deterministic
Probabilistic
8 s2 S.?1(s)1or ?2(s)1 8 a2?1(s)8
b2?2(s)?(s,a,b)1
8 s2 S.?1(s)1or ?2(s)1
Turn based
8 a2?1(s)8 b2?2(s)?(s,a,b)1
Concurrent
22Concurrent Games Example
01 10
01 10
00 11
00 11
Probability to win with deterministic strategies
is 0
Player 1 has a randomized strategy to win with
probability 1/2
Quantitative winning!
23Strategies
- Randomized strategies
- Functions from histories to lotteries over
enabled moves given a play s0s1 sk, - strategy ?i(s0s1sk) D
- for some distribution D over the enabled moves
- Strategy is memoryless if ?i(s0s1sk) ?i(sk)
24Winning Conditions Concurrent Games
- Language ? over outcomes
- Value of a game is the maximal probability of
ensuring the outcome is in Y - h 1 iY(s) supx 1infx 2 Prsx 1x 2 Y
- (where Y Index set for Y)
25Winning Conditions Concurrent Games
- Fundamental Property Determinacy
- For each state s, h 1i? (s) 1 - h 2i ?(s)
- Fundamental Algorithmic Question Given a
concurrent game and a winning condition, find at
each state the maximal probability with which
player 1 can ensure the winning condition holds
26One-Step Game
- Regions are functions f S ! 0,1
- Suppose f is a payoff function on states
- From state s, players choose actions a1, a2
(simultaneously and independently) - The next state Q is chosen according to the
distribution d, and player 1 gets payoff f(Q)
27One-Step Game
- Player 1s value
- Maximal expectation of f(Q)
- Define the value
- Ppre (f) (s) supx 1infx 2ESf(Q)
28Fundamental Theorem of Zero Sum Games
- Equivalent to zero-sum matrix games
- Value and optimal randomized strategies exist for
both players - Minmax Theorem vonNeumann28
- Can be computed by linear programming
- Also shows value for finitely repeated games
- But we are interested in infinite games
29Reachability
- Maximal probability of reaching a set U of states
- Can be reduced to positive stochastic games
- Characterizing winning value
- X0 0 Xn1 max(U, Ppre(Xn))
- X lim Xn
- Correctness is by induction on the n-step game
30Reachability Example
01 10
01 10
S3
00 11
00 11
S1
S2
S4
31No optimal strategy Example
01 10
00
11
Probability of winning is 1
Player 1 has a randomized strategy to win with
probability 1-e for all e
32More Objectives
- ?-regular objectives
- deAlfaroM01 Every two-player concurrent game
with ?-regular winning conditions is determined. - deAlfaroM01 Algorithms to approximate the value
in 3EXPTIME - ChatterjeeMJurdzinski04 Algorithms to
approximate the value of reachability games in
NPÅ coNP - Borel objectives
- Martin 98,MaitraSudderth98 Every two-player
concurrent game with Borel winning conditions is
determined.
33Reachability Game
a,b
a,b
s
t
u
Reach u (t) (-32p 5)/5
34Non Zero Sum Games
- So far, our games had two players
- Player 1s goal was ?
- Player 2s goal was ?
- Strictly competitive!
35Non Zero Sum Games
- But systems are not (always) malicious
- Usually player 1 has a goal ?1, player 2 has a
goal ?2 - These goals are not necessarily contradictory
- Each is happy to ensure his own goal
- Such a game is non zero sum
36Simple Example Ethernet
(s,s), (ns,ns)
(n,s)
(s,n)
(n,s)
(s,n)
(n,s)
(s,n)
37History Non Zero Sum Games
- Every finite n-player game has an equilibrium
Nash50 - Complexity of finding a Nash equilibrium is open
Pap94,Pap01 - Discounted stochastic n player games have a Nash
equilibrium Fick64,MertensParthasarathy86 - 2-player nonzero sum stochastic games with
limiting average payoff Vieille00 - Closed sets SuddherthSecchi02
- Open Sets (Reachability) ChatterjeeJurdzinskiM03
- (This talk)
38One Shot Games
- Games in strategic form
- Bimatrix games
- A matrix of payoffs for each player
- If player 1 plays a, and player 2 plays b, then
- player 1 gets P1a,b
- Player 2 gets P2a,b
39Examples
Chicken
40Nash Equilibrium
- A pair of strategies (?1, ?2) is an ?-Nash
equilibrium if - For all ?1, ?2
- Value2(?1, ?2) Value2(?1, ?2) ?
- Value1(?1, ?2) Value1(?1, ?2) ?
- Neither player has advantage of more than ? in
deviating from the equilibrium strategy - A 0-Nash equilibrium is called a Nash equilibrium
41Nashs Theorem
- Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies. - Proof uses Kakutanis fixpoint theorem
42Nashs Theorem
- Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies. - Idea of proof Define a mapping
- By Kakutanis fixpoint theorem, there is a
fixpoint for this map - This is a Nash equilibrium point
43Nashs Theorem
- Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies. - This also shows Nash equilibria exist in finitely
repeated games
44Algorithms?
- The proof is existential.
- No polynomial time algorithm to find Nash
equilibria is known for 2 person games!
45Reachability Games
- A non zero sum reachability game consists of
- A concurrent game G
- Two sets of states S1 and S2 of G
- Player 1s goal is to get to S1
- Player 2s goal is to get to S2
- Given strategies ?1 and ?2, Valuei(?1,?2) is the
probability with which the stochastic process
visits Si
46Nash Equilibrium in Reachability Games
- Fundamental Question Do ?-Nash equilibria exist
in nonzero sum reachability games for every ?gt0? - Does not follow from Nashs Theorem!
- For safety games, the answer is yes
SudderthSecchi02 - In fact, Nash equilibria exist
- But reachability case does not follow by
duality - For reachability games, the question was open
47No Nash Equilibrium Example
01 10
00
11
Player 1 has a randomized strategy to win with
probability 1-e for all e But no optimal strategy
48Main Theorem
- Theorem ChatterjeeMJurdzinski04 An n-player
nonzero sum reachability game has an ? Nash
equilibrium in memoryless strategies for all ?gt0.
49Idea of proof
- Define ?-discounted games, show memoryless Nash
equilibria exist in such games. - Consider a Nash equilibrium in the ?-discounted
reachability game. This equilibrium can be
approximated by strategies of a simple form
(k-uniform) - This strategy profile is an ?-Nash equilibrium in
the original game for suitable ?. - This is because if I fix the strategy of player
2, in the resulting MDP, the value is close
to the discounted value - Similarly for player 1
50Discounted Reachability Games
- A ?-discounted reachability game is played as
follows. - At each stage, the game stops with probability ?,
and continues with probability 1- ?. - Theorem A ?-discounted reachability game has a
Nash equilibrium in memoryless strategies. - The proof is an application of Kakutanis
fixpoint theorem - This is related to Nash equilibria in discounted
reward games Fink64,Sobel71
51Approximating Strategies
- Let J be a bimatrix game with n players
- Each player has m actions
- A strategy is k-uniform if it is a uniform
distribution over a multiset of size k - Let ? be a Nash equilibrium profile.
- LiptonMarkakisMehta03 For every ?gt0, for every
- k gt (3n2 ln (n2m))/?2 there exists a
k-uniform strategy profile ? s.t. for every
action a, - if ?(a)0, then ?(a)0.
- if ?(a)gt0 then ?(a)- ?(a) lt ?
52Markov Decision Processes
- A Markov decision process (MDP) is a one player
game. - Reachability, discounted reachability is defined
on MDPs by restriction from games. - When we fix the strategies of all but one player
i, we have an MDP Gi.
53Approximating Equilibria inDiscounted Games
- For an n-player discounted reachability game G?,
for every ?gt0, there exists a memoryless strategy
profile ? such that - ? is an ?-Nash equilibrium profile of G? and
- for every player i, the minimum transition
probability in the MDP Gi is at least f(?,n,G).
54Approximating MDPs
- Let G be a MDP reachability game
- Condon90 For all ?gt0 there exists discount
factor ? such that for all states s2 S of the
?-discounted game G? we have - v(s) v?(s) lt ?
55Complexity
- Can approximate an ?-Nash equilibrium to within ?
for constant ?, ? in NP - Guess the memoryless (k-uniform) strategy
profiles - Solve the MDPs after fixing all but one players
strategies - Payoffs can be irrational, so we can only hope to
approximate
56More Objectives
- Fundamental Open Question Is there a nonzero sum
version of Martins Theorem for concurrent games? - Dont know even for
- Mixed safety and reachability objectives
- Likely to be hard problems
57Turn Based Games
- Theorem ChatterjeeMJurdzinki04
- n-Player turn based probabilistic games with
Borel payoffs have ?-Nash equilibria in
deterministic strategies. - n-player turn based deterministic games with
Borel payoffs have Nash equilibria in
deterministic strategies.
58Trick with Deterministic Strategies
- For an n-player game where player i has objective
?i - Consider the zero sum game of player i with
objective ?i against all other players with
objective ?i - Suppose this zero sum game has a deterministic
winning strategy ?i for i and ?i for all the
others - Nash equilibrium
- Every player i plays ?i from above.
- As soon as someone deviates, all the other
players punish by switching to ?i - Deterministic strategies are necessary to observe
deviations - Folk result? ThuijsmanRaghavan97.
59Turn Based Games
- A careful study of Martins determinacy proof
shows that we can construct ?-optimal
deterministic strategies for turn based
probabilistic games - And optimal pure strategies for deterministic
turn based games
60Las Vegas Game
Work
Go to Vegas
Play again
1/2
Jackpot
Sorry you lose
61Las Vegas Game
- For every ?gt0, Las Vegas game has a (1-?)-optimal
winning strategy - For ? 1/2n, work for n days before heading to
Vegas - But no optimal winning strategy
- The winning condition is not ?-regular
- Number of times you are allowed to play is the
number of days you have worked
62?-Regular?
- The Las Vegas game is not ?-regular
- For ?-regular games, optimal deterministic
winning strategies exist ChatterjeeJurdzinskiHenz
inger04 - Thus, turn based nonzero sum games with ?-regular
objectives have pure Nash equilibria. - For parity conditions, we can compute value
profile of some Nash equilibrium in NP
63Credits
- Work done in collaboration with
- Luca de Alfaro. Quantitative solution of
concurrent games, STOC01 - Krishnendu Chatterjee and Marcin Jurdzinski. On
Nash equilibria in stochastic games, CSL04
64Thank You!
- http//www.cs.ucla.edu/rupak