Title: Computing Nash Equilibrium
1Computing Nash Equilibrium
2Outline
- Problem Definition
- Notation
- Last week Zero-Sum game
- This week
- Zero Sum Online algorithm
- General Sum Games
- Multiple players approximate Nash
- 2 players exact Nash
3Model
- Multiple players N1, ... , n
- Strategy set
- Player i has m actions Si si1, ... , sim
- Si are pure actions of player i
- S ?i Si
- Payoff functions
- Player i ui S ? ?
4Strategies
- Pure strategies actions
- Mixed strategy
- Player i pi distribution over Si
- Game P ?i pi
- Product distribution
- Modified distribution
- P-i probability P except for player i
- (q, P-i ) player i plays q other player pj
5Notations
- Average Payoff
- Player i ui(P) EsPui(s) ? P(s)ui(s)
- P(s) ?i pi (si)
- Nash Equilibrium
- P is a Nash Eq. If for every player i
- For any distribution qi
- ui(qi,P-i) ? ui(P)
- Best Response
6Two player games
- Payoff matrices (A,B)
- m rows and n columns
- player 1 has m action, player 2 has n actions
- strategies p and q
- Payoffs u1(pq)pAqt and u2(pq) pBqt
- Zero sum game
- A -B
7Online learning
- Playing with unknown payoff matrix
- Online algorithm
- at each step selects an action.
- can be stochastic or fractional
- Observes all possible payoffs
- Updates its parameters
- Goal Achieve the value of the game
- Payoff matrix of the game define at the end
8Online learning - Algorithm
- Notations
- Opponent distribution Qt
- Our distribution Pt
- Observed cost M(i, Qt)
- Should be MQt, and M(Pt,Qt) Pt M Qt
- cost on 0,1
- Goal minimize cost
- Algorithm Exponential weights
- Action i has weight proportional to bL(i,t)
- L(i,t) loss of action i until time t
9Online algorithm Notations
- Formally
- Number of total steps T is known
- parameter b 0lt b lt 1
- wt1(i) wt(i) bM(i,Qt)
- Zt ? wt(i)
- Pt1(i) wt1(i) / Zt
- Initially, P1(i) gt 0 , for every i
10Online algorithm Theorem
- Theorem
- For any matrix M with entries in 0,1
- Any sequence of dist. Q1 ... QT
- The algorithm generates P1, ... , PT
- RE(AB) ExA ln (A(x) / B(x) )
11Relative Entropy
- For any two distributions A and B
- RE(AB) ExA ln (A(x) / B(x) )
- can be infinite
- B(x) 0 and A(x) ? 0
- Always non-negative
- log is concave
- ? ai log bi ? log ? ai bi
- ? A(x) ln B(x) / A(x) ? ln ? A(x) B(x) / A(x) 0
12Online algorithm Analysis
- Lemma
- For any mixed strategy P
- Corollary
13Online Algorithm Optimization
- b 1/(1 sqrt2 (ln n) / T)
- additional loss
- O(sqrt(ln n )/T)
- Zero sum game
- Average Loss v
- additional loss O(sqrt(ln n )/T)
14Example Zero Sum
15Two players General sum games
- Input matrices (A,B)
- No unique value
- Computational issues
- find some Nash,
- all Nash
- Can be exponentially many
- identity matrix
- Example 2xN
16Computational Complexity
- Complexity of finding a sample equilibrium is
unknown - no proof of NP-completeness seems possible
(Papadimitriou, 94) - Equilibria with certain properties are NP-Hard
- e.g., max-payoff, max-support
- (Even) for symmetric 2-player games
- ? NE with expected social welfare at least k?
- ? NE with least payoff at least k?
- ? Pareto-optimal NE?
- ? NE with player 1 EU of at least k?
- ? multiple NE?
- ? NE where player 1 plays (or not) a particular
strategy?
Gilboa Zemel, Conitzer Sandholm
17Two players General sum games
- player 1 best response
- Like for zero sum
- Fix strategy q of player 2
- maximize p (Aqt) such that ?j pj 1 and pj ?0
- dual LP minimize u such that u ? Aqt
- Strong Duality p(Aqt) u p u
- p( u Aq) 0
- complementary system
- Player 2 q(v- pB) 0
18Nash Linear Complementary System
- Find distributions p and q and values u and v
- u ? Aqt
- v ? pB
- p( u Aq) 0
- q(v- pB) 0
- ?j pj 1 and pj ? 0
- ?j qj 1 and qj ? 0
19Two players General sum games
- Assume the support of strategies known.
- p has support Sp and q has support Sq
- Can formulate the Nash as LP
20Approximate Nash
- Assume we are given Nash
- strategies (p,q)
- Show that there exists
- small support
- epsilon-Nash
- Brute force search
- enumerate all small supports!
- Each one requires only poly. time
- Proof!
21Nash Linear Complementary System
- Find distributions p and q and values u and v
- u ? Aqt
- v ? pB
- p( u Aq) 0
- q(v- pB) 0
- ?j pj 1 and pj ? 0
- ?j qj 1 and qj ? 0
22Lemke Howson
- Define labeling
- For strategy p (player 1)
- Label i if (pi0) where i action of player 1
- Label j if action j (payer 2) is best response
to p - bj p ? bkp
- Similar for player 2
- Label j if (qj0) where j action of player 2
- Label i if action i (payer 1) is best response
to q - ai q ? ajq
23LM algo
- strategy (p,q) is Nash if and only if
- Each label k is either a label of p or q (or
both) - Proof!
- Example
24Lemke-Howson Example
G1
G2
a3
a5
(0,0,1)
(0,1)
1
2
4
(0,1/3,2/3)
4
2
(1/3,2/3)
1
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
U2
U1
25Lemke-Howson Example
G1
G2
a3
a5
(0,0,1)
(0,1)
1
2
4
(0,1/3,2/3)
4
2
(1/3,2/3)
1
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
U2
U1
26LM non-degenerate
- Two player game is non-degenerate if
- given a strategy (p or q)
- with support k
- At most k pure best responses
- Many equivalent definitions
- Theorem For a non-degenerate game
- finite number of p with m labels
- finite number of q with n labels
27LM Graphs
- Consider distributions where
- player 1 has m labels
- player 2 has n labels
- Graph (per player)
- join nodes that share all but 1 label
- Product graph
- nodes are pair of nodes (p,q)
- edges if (p,p) an edge then (p,q)-(p,q) edge
28LM
- completely labeled node
- node that has mn labels
- Nash!
- node k-almost completely labeled
- all labeling but label k.
- edge k-almost completely labeled
- all labels on both sides except label k
- artificial node (0,0)
29LM Paths
- Any Nash Eq.
- connected to exactly one vertex which is
- k-almost completely labeled
- Any k-almost completely labeled node
- has two neighbors in the graph
- Follows from the non-degeneracy!
30LM algo
- start at (0,0)
- drop label k
- follow a path
- end of the path is a Nash
31Lemke-Howson Algorithm
a3
a5
(0,0,1)
G1
G2
(0,1)
1
2
4
(0,1/3,2/3)
4
2
(1/3,2/3)
1
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
32Lemke-Howson Algorithm
a3
a5
G2
(0,0,1)
G1
(0,1)
1
2
4
(0,1/3,2/3)
4
2
(1/3,2/3)
1
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
33Lemke-Howson Algorithm
a3
a5
(0,0,1)
G1
G2
(0,1)
1
2
4
(0,1/3,2/3)
4
2
1
(1/3,2/3)
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
34Lemke-Howson Other Equilibria
a3
a5
G1
(0,0,1)
G2
(0,1)
1
2
4
(0,1/3,2/3)
4
2
1
(1/3,2/3)
a1
3
(2/3,1/3)
5
(1,0,0)
a4
(2/3,1/3,0)
(1,0)
5
3
(0,1,0)
a2
35LM Theorem
- Consider a non-degenerate game
- Graph consists of disjoint paths and cycles
- End points of paths are Nash
- or (0,0)
- Number of Nash is odd.
36LM Sketch of Proof
- Deleting a label k
- making support larger
- making BR smaller
- Smaller BR
- solve for the smaller BR
- subtract from dist. until one component is zero
- Larger support
- unique solution (since non-degenerate)