Title: Arguments for Recovering Cooperation
1Arguments for Recovering Cooperation
- Conclusions that some have drawn from analysis of
prisoners dilemma - the game theory notion of rational action is
wrong! - somehow the dilemma is being formulated wrongly
- This isnt rational. We may not defect for a few
cents. If suckers payoff really hurts, more
likely to be rational.
2Arguments to recover cooperation
- We are not all self-centered! But sometimes we
are nice because there is a punishment. If we
dont give up seat on bus, we receive rude
stares. - If this were true, places like Honor Copy would
be exploited. - The other prisoner is my twin! When I decide
what to do, the other agent will do the same.
(but cant force it, as wouldnt be autonomous). - Your mother would say, What if everyone were to
behave like that? You say, I would be a fool
to act any other way. - The shadow of the futurewe will meet again.
3The Iterated Prisoners Dilemma
- One answer play the game more than once
- If you know you will be meeting your opponent
again, then the incentive to defect appears to
evaporate - Cooperation is the rational choice in the
infinitely repeated prisoners dilemma(Hurrah!)
4Backwards Induction
- Butsuppose you both know that you will play the
game exactly n timesOn round n - 1, you have an
incentive to defect, to gain that extra bit of
payoffBut this makes round n 2 the last
real, and so you have an incentive to defect
there, too.This is the backwards induction
problem. - Playing the prisoners dilemma with a fixed,
finite, pre-determined, commonly known number of
rounds, defection is the best strategy
5Axelrods Tournament
- Suppose you play iterated prisoners dilemma
against a range of opponentsWhat strategy
should you choose, so as to maximize your overall
payoff? - Axelrod (1984) investigated this problem, with a
computer tournament for programs playing the
prisoners dilemma
6Axelrods tournament invited political
scientists, psychologists, economists, game
theoreticians to play iterated prisoners dilemma
- All-D always defect
- Random randomly pick a strategy
- Tit-for-Tat On first round cooperate. Then do
whatever your opponent did last. - Tester first defect, If the opponent ever
retaliated, then use tit-for-tat. If the opponent
did not defect, cooperate for two rounds, then
defect. - Joss Tit-for-tat, but 10 of the time, defect
instead of cooperating.
7Tit-for-Tat
- Why? Because you were averaging over all types
of strategy - If you played only All-D, tit-for-tat would lose.
8Two Trigger Strategies
- Grim trigger strategy
- Cooperate until a rival deviates
- Once a deviation occurs,
play non-cooperatively for the
rest of the game - Tit-for-tat
- Cooperate if your rival cooperated
in the most recent period - Cheat if your rival cheated
in the most recent period
9Axelrod's rules for success
- Do not be envious not necessary to beat your
opponent in order to do well. This is not zero
sum. - Do not be the first to defect. Be nice. Start by
cooperating. - Retaliate appropriatelyAlways punish defection
immediately, but use measured force dont
overdo it - Dont hold grudgesAlways reciprocate
cooperation immediately - do not be too clever
- when you try to learn from the other agent, dont
forget he is trying to learn from you. - Be forgiving one defect doesnt mean you can
never cooperate - The opponent may be acting randomly
10The centipede game
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
11The centipede game
The solution to this game through roll back is
for Jack to stop in the first round!
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
12The centipede game
- What actually happens?
- In experiments the game usually continues for at
least a few rounds and occasionally goes all the
way to the end. - But going all the way to the (99, 99) payoff
almost never happens at some stage of the game
cooperation breaks down. - So still do not get sustained cooperation even if
move away from roll back as a solution
13Lessons from finite repeated games
- Finite repetition often does not help players to
reach better solutions - Often the outcome of the finitely repeated game
is simply the one-shot Nash equilibrium repeated
again and again. - There are SOME repeated games where finite
repetition can create new equilibrium outcomes.
But these games tend to have special properties - For a large number of repetitions, there are some
games where the Nash equilibrium logic breaks
down in practice.
14Threats
- Threatening retaliatory actions may help gain
cooperation - Threat needs to be believable
15What is Credibility?
- The difference between genius and stupidity
is that genius has its limits. - Albert Einstein
- You are not credible if you propose to take
suboptimal actions. - If a rational actor
- proposes to play a strategy
- which earns suboptimal profit.
- How can one be credible?
16non-credible threat
- A non-credible threat is a threat made by a
player in a Sequential Game which would not be in
the best interest for the player to carry out.
The hope is that the threat is believed in which
case there is no need to carry it out. While Nash
equilibria may depend on non-credible threats,
Backward Induction eliminates them.
17Trigger Strategy Extremes
- Tit-for-Tat is
- most forgiving
- shortest memory
- proportional
- credible but lacks deterrence
- Tit-for-tat answers
- Is cooperation easy?
- Grim trigger is
- least forgiving
- longest memory
- MAD
- adequate deterrence but lacks credibility
- Grim trigger answers
- Is cooperation possible?
18concepts of rationality doing the rational thing
- undominated strategy
- (problem too weak) cant always find a
single one - (weakly) dominating strategy (alias duh?)
- (problem too strong, rarely exists)
- Nash equilibrium (or double best response)
- (problem equilibrium may not exist)
- randomized (mixed) Nash equilibrium players
choose various options based on some random
number (assigned via a probability) - Theorem Nash 1952 randomized Nash Equilibrium
always exists.
. . .
19Mixed strategy equilibria
- ?i(sj)) is the probability player i selects
strategy sj - (0,0,1,0,0) is a pure strategy
- Strategy profile ?(?1,, ?n)
- Expected utility ui(?)?s?S(?j ?(sj))ui(s)
- (chance the combination occurs times utility)
- Nash Equilibrium
- ? is a (mixed) Nash equilibrium if
?i??i defines a probability distribution over Si
ui(?i, ?-i)?ui(?i, ?-i) for all ?i??i, for all
i
20Example Matching Penniesno pure strategy Nash
Equilibrium
H
T
-1, 1 1,-1
1,-1 -1, 1
H
T
So far we have talked only about pure strategy
equilibria I make one choice.. Not all games
have pure strategy equilibria. Some equilibria
are mixed strategy equilibria.
21Example Matching Pennies
q H
1-q T
-1, 1 1,-1
1,-1 -1, 1
p H
1-p T
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent between his
own choices! Compute expected utility given
each pure possibility of other player.
22I am player 2. What should I do?I pick a
defensive strategy
- If player1 picks head
- -q(1-q)
- If Player 1 picks tails
- q -(1-q)
- Want my opponent NOT to care what I pick. The
idea is, if my opponent gets excited about what
my strategy is, it means I have left open an
opportunity for him. When he doesnt have to
analyze what he should do, it says there is no
way he wins big. - So
- -q (1-q) q -1q
- 1-2q2q-1 so q1/2
23Example Bach/Stravinsky
q B
1-q S
2, 1 0,0
0,0 1, 2
p B
1-p S
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent to what
player1 does. Compute expected utility given
each pure possibility of yours.
player 1 is optimally mixing
player 2 is optimally mixing
24- This is consistent with Dans advice look after
yourself. -
- I Used to Think I Was Indecisive
- - But Now Im Not So Sure
- Anonymous
25Mixed Strategies
- Unreasonable predictors of one-time human
interaction - Reasonable predictors of long-term proportions
26Employee Monitoring
- Employees can work hard or shirk
- Salary 100K unless caught shirking
- Cost of effort 50K
- (We are assuming that when he works he loses
something. Think of him running a business of
his own while getting paid as his day job so if
he works, he cant do that and loses the money
the business makes.) - Managers can monitor or not
- Value of employee output 200K
- (We assume he must be worth more than we pay
him to cover profit, infrastructure, manager
time, mistakes, etc.) - Profit if employee doesnt work 0
- Cost of monitoring 10K
27Employee Monitoring
Manager
Monitor No Monitor
Employee Work 50 , 90 50 , 100
Employee Shirk 0 , -10 100 , -100
- From the problem statement, VERIFY the numbers in
the table are correct. - No equilibrium in pure strategies - SHOW IT
- What do the players do in mixed strategies? DO AT
SEATS - Please do not consider this instruction for how
to cheat your boss. Rather, think of it as
advice in how to deal with employees.
28Mixed Strategies
- Randomize surprise the rival
- Mixed Strategy
- Specifies that an actual move be chosen randomly
from the set of pure strategies with some
specific probabilities. - Nash Equilibrium in Mixed Strategies
- A probability distribution for each player
- The distributions are mutual best responses to
one another in the sense of expectations
29Finding Mixed Strategies
- Suppose
- Employee chooses (shirk, work) with
probabilities (p,1-p) - Manager chooses (monitor, no monitor) with
probabilities (q,1-q) - Find expected payoffs for each player
- Use these to calculate best responses
30Employees Payoff
- First, find employees expected payoff from each
pure strategy - If employee works receives 50
- Profit(work) 50 ?q 50 ?(1-q)
- 50
- If employee shirks receives 0 or 100
- Profit(shirk) 0 ?q 100 ?(1-q)
- 100 100q
31Employees Best Response
- Next, calculate the best strategy for possible
strategies of the opponent - For qlt1/2 SHIRK
- Profit(shirk) 100-100q gt 50 Profit(work)
SHIRK - For qgt1/2 WORK
- Profit(shirk) 100-100q lt 50 Profit(work)
WORK - For q1/2 INDIFFERENT
- Profit(shirk) 100-100q 50 Profit(work) ????
32Managers Best Response
- u2(mntr) 90 ?(1-p) - 10 ?p
- u2(no m) 100 ?(1-p) -100 ?p
- For plt1/10 NO MONITOR
- u2 (mntr) 90-100p lt 100-200p u2(no mntr)
- For pgt1/10 MONITOR
- u2(mntr) 90-100p gt 100-200p u2(no mntr)
- For p1/10 INDIFFERENT
- u2(mntr) 90-100p 100-200p u2(no mntr)
33Cycles
1
shirk
p
1/10
work
0
0
1
1/2
q
monitor
no monitor
34Mixed Strategy Equilibrium
- Employees shirk with probability 1/10
- Managers monitor with probability ½
- Expected payoff to employee
- chance of each of four outcomes x
payoff from each -
- Expected payoff to manager
35Properties of Equilibrium
- Both players are indifferent between any mixture
over their strategies - E.g. employee
- If shirk
- If work
- Regardless of what employee does, expected payoff
is the same
36Use Indifference to Solve I
q 1-q
Monitor No Monitor
Work 50, 90 50 , 100 50q50(1-q)
Shirk 0, -10 100 , -100 0q100(1-q)
- 50q50(1-q) 0q100(1-q)
- 50 100-100q
- 50 100q
- q 1/2
37Use Indifference to Solve II
Monitor No Monitor
1-p Work 50 , 90 50 , 100
p Shirk 0 , -10 100 , -100
90(1-p)-10p 100(1-p)-100p
- 90(1-p)-10p 100(1-p)-100p
- 90-100p 100 200p
- 100p 10
- p 1/10
38Indifference
1/2 1/2
Monitor No Monitor
9/10 Work 50 , 90 50 , 100 50
1/10 Shirk 0 , -10 100 , -100 50
80 80
39Upsetting?
- This example is upsetting as it appears to tell
you, as workers, to shirk. - Think of it from the managers point of view,
assuming you have unmotivated (or unhappy)
workers. - A better option would be to hire dedicated
workers, but if you have people who are trying to
cheat you, this gives a reasonable response. - Sometimes you are dealing with individuals who
just want to beat the system. In that case, you
need to play their game. For example, people who
try to beat the IRS. - On the positive side, even if you have dishonest
workers, if you get too paranoid about monitoring
their work, you lose! This theory tells you to
lighten up! - This theory might be applied to criticising your
friend or setting up rules/punishment for your
(future?) children.
40Why Do We Mix?
- I dont want to give my opponent an advantage.
When my opponent cant decide what to do based on
my strategy, I win as there is not way he is
going to take advantage of me.
COMMANDMENT Use the mixed strategy that keeps
your opponent guessing.
41Mixed Strategy Equilibriums
- Anyone for tennis?
- Should you serve to the forehand or the backhand?
42Tennis Payoffs
43Tennis Fixed SumIf you win (the points), I lose
(the points)AKA Strictly competitive
q
1-q
p
1-p
44Solving for Servers Optimal Mix
- What would happen if the the server always served
to the forehand? - A rational receiver would always anticipate
forehand and 90 of the serves would be
successfully returned.
45Solving for Servers Optimal Mix
- What would happen if the the server aimed to the
forehand 50 of the time and the backhand 50 of
the time and the receiver always guessed
forehand? - (0.50.9) (0.50.2) 0.55 successful returns
46Solving for Servers Optimal Mix
- What is the best mix for each player?
- Receiver thinks
- if server serves forehand .10p .70(1-p)
- if server serves backhand .80p .40(1-p)
- I want them to be the same
- .10p .70(1-p) .80p .40(1-p)
- .10p .70 -.70p .80p .40 -.40p
- -.6p.7 .4p .4
- .3 p
- Use similar argument to solve for q -
47Draw a graph which shows two lines(1) the
utility of server of picking forehand as a
function of p. (2) the utility of server of
picking backhand as a function of p.
48What can you learn from the graph?
49Receivers view of opponentAbove 1/3, backhand
wins.
p
50Receivers view of opponentAbove .3, serving
backhand wins.
p
51Servers view of opponentAbove .4 plan forehand
wins
q
52 of Successful Returns Given Server and Receiver
Actions
Where would you shoot knowing the other
player will respond to your choices? In other
words, you pick the row but will likely get the
smaller value in a row.
53Consider Bach or Stravinsky
- If the other player is maximally mixing, my
payoffs are the same, so 2(Y) 1(1-Y) Y 1/3 - 1(X) 2 (1-X) X 2/3
Yolanda
B (Y)
S(1-Y)
2,1 0,0
0,0 1,2
B(X)
No dom. str. equil.
Xavier
S(1-X)
54Best Response Function
- If 0 lt Y lt 1/3, then player 1s best response is
X0. - If y 1/3, then ALL of player 1s responses are
best responses - If y gt 1/3, then player 1s best response is X1.
- Using excel, prove this to yourself!
55Best Response Function(The dotted line is a
function only if you mentally switch the axes.)
Y
Fixed Point where best response
functions intersect is the nash Equilibrium The
best response of player 1 is shown as a dotted
line.
1
1/3
X
2/3
1
56p q player 1 player 2
0.1 0.1 0.83 1.63
0.1 0.2 0.76 1.46
0.1 0.3 0.69 1.29
0.1 0.4 0.62 1.12
0.1 0.5 0.55 0.95
0.1 0.6 0.48 0.78
0.1 0.7 0.41 0.61
0.1 0.7 0.41 0.61
0.1 0.8 0.34 0.44
0.1 0.9 0.27 0.27
0.1 1 0.2 0.1
p q player 1 player 2
0.1 0.33 0.669 1.24
0.2 0.33 0.668 1.14
0.3 0.33 0.667 1.04
0.4 0.33 0.666 0.94
0.5 0.33 0.665 0.84
0.6 0.33 0.664 0.73
0.7 0.33 0.663 0.63
0.7 0.33 0.663 0.63
0.8 0.33 0.662 0.53
0.9 0.33 0.661 0.43
1 0.33 0.66 0.33
p q player 1 player 2
0.67 0.1 0.4338 0.67
0.67 0.2 0.5336 0.67
0.67 0.3 0.6334 0.67
0.67 0.4 0.7332 0.67
0.67 0.5 0.833 0.67
0.67 0.6 0.9328 0.67
0.67 0.7 1.0326 0.67
0.67 0.7 1.0326 0.67
0.67 0.8 1.1324 0.67
0.67 0.9 1.2322 0.67
0.67 1 1.332 0.67
57Hints to understanding graph
- The solid line represents Yolandas thinking. If
Xavier is going to select B less than 2/3s of
the time, Yolanda is best selecting S (which
happens when Y0). - HOWEVER, if Xavier is going to select B more than
2/3s of the time, Xavier should immediately
start selecting S (which happens when y1).
58Computing mixed stategies for two players (the
books way)
- Write the matrix game in bi matrix form Aaij
Bbij - Compute payoffs
- Replace pm 1- and qn 1-
- Consider the partial derviatives of ?1 and ?2
with respect to all pi and all qi respectively. - Solve system of equations with all partials set
to zero
59Example
?1 3 p1q1 p2q2 3p1q1 (1-p1)(1-q1) 1 -p1
q1 4p1q1 ?2 p1q1 4p2q2 p1q1
4(1-p1)(1-q1) 4 -4p1-4q1 5p1q1 d?1 /dp1 -1
4q1 so q1 ¼ d?2 /dq1 -4 5p1 so p1
4/5 So strategies are ((4/5,1/5)(¼, ¾))
60Example 2
?1 3 p1q1 -p1q2 -2p2q1 p2q2 3 p1q1
-p1(1-q1) -2(1-p1)q1 (1-p1)(1-q1) 3p1q1 p1
p1q1 -2q1 2p1q1 1- p1 q1 p1q1 17p1q1-2p1-3
q1 ?2 p1q1 4p2q2 p1q1 4(1-p1)(1-q1) 4
-4p1-4q1 5p1q1 d?1 /dp1 -2 7q1 so q1
2/7 d?2 /dq1 -4 5p1 so p1 4/5 So strategies
are ((4/5,1/5)(2/7,5/7))
61Tennis Example
?1 90 p1q1 20p1q2 30p2q1 60p2q2
90pq 20p(1-q) 30(1-p)q 60(1-p)(1-q)
90pq 20p-20pq 30q-30pq 60 -60p-60q60pq
60100pq -40p -30q ?2 10pq
80p(1-q)70(1-p)q40(1-p)(1-q) 10pq 80p
-80pq 70q-70pq40-40p-40q40pq -100pq 40p30q
40 d?1 /dp1 100q-40 so q .4 d?2 /dq1 -100p
30 so p .3 So strategies are ((.3, .7)(.4,
.6))