Arguments for Recovering Cooperation

About This Presentation

Title:

Arguments for Recovering Cooperation

Description:

Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner s dilemma: the game theory notion of rational action is wrong! – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 62

Provided by: klar162

Category:

more less

Transcript and Presenter's Notes

Title: Arguments for Recovering Cooperation

1
Arguments for Recovering Cooperation

Conclusions that some have drawn from analysis of
prisoners dilemma
the game theory notion of rational action is
wrong!
somehow the dilemma is being formulated wrongly
This isnt rational. We may not defect for a few
cents. If suckers payoff really hurts, more
likely to be rational.

2
Arguments to recover cooperation

We are not all self-centered! But sometimes we
are nice because there is a punishment. If we
dont give up seat on bus, we receive rude
stares.
If this were true, places like Honor Copy would
be exploited.
The other prisoner is my twin! When I decide
what to do, the other agent will do the same.
(but cant force it, as wouldnt be autonomous).
Your mother would say, What if everyone were to
behave like that? You say, I would be a fool
to act any other way.
The shadow of the futurewe will meet again.

3
The Iterated Prisoners Dilemma

One answer play the game more than once
If you know you will be meeting your opponent
again, then the incentive to defect appears to
evaporate
Cooperation is the rational choice in the
infinitely repeated prisoners dilemma(Hurrah!)

4
Backwards Induction

Butsuppose you both know that you will play the
game exactly n timesOn round n - 1, you have an
incentive to defect, to gain that extra bit of
payoffBut this makes round n 2 the last
real, and so you have an incentive to defect
there, too.This is the backwards induction
problem.
Playing the prisoners dilemma with a fixed,
finite, pre-determined, commonly known number of
rounds, defection is the best strategy

5
Axelrods Tournament

Suppose you play iterated prisoners dilemma
against a range of opponentsWhat strategy
should you choose, so as to maximize your overall
payoff?
Axelrod (1984) investigated this problem, with a
computer tournament for programs playing the
prisoners dilemma

6
Axelrods tournament invited political
scientists, psychologists, economists, game
theoreticians to play iterated prisoners dilemma

All-D always defect
Random randomly pick a strategy
Tit-for-Tat On first round cooperate. Then do
whatever your opponent did last.
Tester first defect, If the opponent ever
retaliated, then use tit-for-tat. If the opponent
did not defect, cooperate for two rounds, then
defect.
Joss Tit-for-tat, but 10 of the time, defect
instead of cooperating.

7
Tit-for-Tat

Why? Because you were averaging over all types
of strategy
If you played only All-D, tit-for-tat would lose.

8
Two Trigger Strategies

Grim trigger strategy
Cooperate until a rival deviates
Once a deviation occurs,
play non-cooperatively for the
rest of the game
Tit-for-tat
Cooperate if your rival cooperated
in the most recent period
Cheat if your rival cheated
in the most recent period

9
Axelrod's rules for success

Do not be envious not necessary to beat your
opponent in order to do well. This is not zero
sum.
Do not be the first to defect. Be nice. Start by
cooperating.
Retaliate appropriatelyAlways punish defection
immediately, but use measured force dont
overdo it
Dont hold grudgesAlways reciprocate
cooperation immediately
do not be too clever
when you try to learn from the other agent, dont
forget he is trying to learn from you.
Be forgiving one defect doesnt mean you can
never cooperate
The opponent may be acting randomly

10
The centipede game
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
11
The centipede game
The solution to this game through roll back is
for Jack to stop in the first round!
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
12
The centipede game

What actually happens?
In experiments the game usually continues for at
least a few rounds and occasionally goes all the
way to the end.
But going all the way to the (99, 99) payoff
almost never happens at some stage of the game
cooperation breaks down.
So still do not get sustained cooperation even if
move away from roll back as a solution

13
Lessons from finite repeated games

Finite repetition often does not help players to
reach better solutions
Often the outcome of the finitely repeated game
is simply the one-shot Nash equilibrium repeated
again and again.
There are SOME repeated games where finite
repetition can create new equilibrium outcomes.
But these games tend to have special properties
For a large number of repetitions, there are some
games where the Nash equilibrium logic breaks
down in practice.

14
Threats

Threatening retaliatory actions may help gain
cooperation
Threat needs to be believable

15
What is Credibility?

The difference between genius and stupidity
is that genius has its limits.
Albert Einstein
You are not credible if you propose to take
suboptimal actions.
If a rational actor
proposes to play a strategy
which earns suboptimal profit.
How can one be credible?

16
non-credible threat

A non-credible threat is a threat made by a
player in a Sequential Game which would not be in
the best interest for the player to carry out.
The hope is that the threat is believed in which
case there is no need to carry it out. While Nash
equilibria may depend on non-credible threats,
Backward Induction eliminates them.

17
Trigger Strategy Extremes

Tit-for-Tat is
most forgiving
shortest memory
proportional
credible but lacks deterrence
Tit-for-tat answers
Is cooperation easy?

Grim trigger is
least forgiving
longest memory
MAD
adequate deterrence but lacks credibility
Grim trigger answers
Is cooperation possible?

18
concepts of rationality doing the rational thing

undominated strategy
(problem too weak) cant always find a
single one
(weakly) dominating strategy (alias duh?)
(problem too strong, rarely exists)
Nash equilibrium (or double best response)
(problem equilibrium may not exist)
randomized (mixed) Nash equilibrium players
choose various options based on some random
number (assigned via a probability)
Theorem Nash 1952 randomized Nash Equilibrium
always exists.

. . .
19
Mixed strategy equilibria

?i(sj)) is the probability player i selects
strategy sj
(0,0,1,0,0) is a pure strategy
Strategy profile ?(?1,, ?n)
Expected utility ui(?)?s?S(?j ?(sj))ui(s)
(chance the combination occurs times utility)
Nash Equilibrium
? is a (mixed) Nash equilibrium if

?i??i defines a probability distribution over Si
ui(?i, ?-i)?ui(?i, ?-i) for all ?i??i, for all
i
20
Example Matching Penniesno pure strategy Nash
Equilibrium
H
T
-1, 1 1,-1
1,-1 -1, 1
H
T
So far we have talked only about pure strategy
equilibria I make one choice.. Not all games
have pure strategy equilibria. Some equilibria
are mixed strategy equilibria.
21
Example Matching Pennies
q H
1-q T
-1, 1 1,-1
1,-1 -1, 1
p H
1-p T
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent between his
own choices! Compute expected utility given
each pure possibility of other player.
22
I am player 2. What should I do?I pick a
defensive strategy

If player1 picks head
-q(1-q)
If Player 1 picks tails
q -(1-q)
Want my opponent NOT to care what I pick. The
idea is, if my opponent gets excited about what
my strategy is, it means I have left open an
opportunity for him. When he doesnt have to
analyze what he should do, it says there is no
way he wins big.
So
-q (1-q) q -1q
1-2q2q-1 so q1/2

23
Example Bach/Stravinsky
q B
1-q S
2, 1 0,0
0,0 1, 2
p B
1-p S
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent to what
player1 does. Compute expected utility given
each pure possibility of yours.
player 1 is optimally mixing
player 2 is optimally mixing
24

This is consistent with Dans advice look after
yourself.
I Used to Think I Was Indecisive
- But Now Im Not So Sure
Anonymous

25
Mixed Strategies

Unreasonable predictors of one-time human
interaction
Reasonable predictors of long-term proportions

26
Employee Monitoring

Employees can work hard or shirk
Salary 100K unless caught shirking
Cost of effort 50K
(We are assuming that when he works he loses
something. Think of him running a business of
his own while getting paid as his day job so if
he works, he cant do that and loses the money
the business makes.)
Managers can monitor or not
Value of employee output 200K
(We assume he must be worth more than we pay
him to cover profit, infrastructure, manager
time, mistakes, etc.)
Profit if employee doesnt work 0
Cost of monitoring 10K

27
Employee Monitoring
Manager
Monitor No Monitor
Employee Work 50 , 90 50 , 100
Employee Shirk 0 , -10 100 , -100

From the problem statement, VERIFY the numbers in
the table are correct.
No equilibrium in pure strategies - SHOW IT
What do the players do in mixed strategies? DO AT
SEATS
Please do not consider this instruction for how
to cheat your boss. Rather, think of it as
advice in how to deal with employees.

28
Mixed Strategies

Randomize surprise the rival
Mixed Strategy
Specifies that an actual move be chosen randomly
from the set of pure strategies with some
specific probabilities.
Nash Equilibrium in Mixed Strategies
A probability distribution for each player
The distributions are mutual best responses to
one another in the sense of expectations

29
Finding Mixed Strategies

Suppose
Employee chooses (shirk, work) with
probabilities (p,1-p)
Manager chooses (monitor, no monitor) with
probabilities (q,1-q)
Find expected payoffs for each player
Use these to calculate best responses

30
Employees Payoff

First, find employees expected payoff from each
pure strategy
If employee works receives 50
Profit(work) 50 ?q 50 ?(1-q)
50
If employee shirks receives 0 or 100
Profit(shirk) 0 ?q 100 ?(1-q)
100 100q

31
Employees Best Response

Next, calculate the best strategy for possible
strategies of the opponent
For qlt1/2 SHIRK
Profit(shirk) 100-100q gt 50 Profit(work)
SHIRK
For qgt1/2 WORK
Profit(shirk) 100-100q lt 50 Profit(work)
WORK
For q1/2 INDIFFERENT
Profit(shirk) 100-100q 50 Profit(work) ????

32
Managers Best Response

u2(mntr) 90 ?(1-p) - 10 ?p
u2(no m) 100 ?(1-p) -100 ?p
For plt1/10 NO MONITOR
u2 (mntr) 90-100p lt 100-200p u2(no mntr)
For pgt1/10 MONITOR
u2(mntr) 90-100p gt 100-200p u2(no mntr)
For p1/10 INDIFFERENT
u2(mntr) 90-100p 100-200p u2(no mntr)

33
Cycles
1
shirk
p
1/10
work
0
0
1
1/2
q
monitor
no monitor
34
Mixed Strategy Equilibrium

Employees shirk with probability 1/10
Managers monitor with probability ½
Expected payoff to employee
chance of each of four outcomes x
payoff from each
Expected payoff to manager

35
Properties of Equilibrium

Both players are indifferent between any mixture
over their strategies
E.g. employee
If shirk
If work
Regardless of what employee does, expected payoff
is the same

36
Use Indifference to Solve I
q 1-q
Monitor No Monitor
Work 50, 90 50 , 100 50q50(1-q)
Shirk 0, -10 100 , -100 0q100(1-q)

50q50(1-q) 0q100(1-q)
50 100-100q
50 100q
q 1/2

37
Use Indifference to Solve II
Monitor No Monitor
1-p Work 50 , 90 50 , 100
p Shirk 0 , -10 100 , -100
90(1-p)-10p 100(1-p)-100p

90(1-p)-10p 100(1-p)-100p
90-100p 100 200p
100p 10
p 1/10

38
Indifference
1/2 1/2
Monitor No Monitor
9/10 Work 50 , 90 50 , 100 50
1/10 Shirk 0 , -10 100 , -100 50
80 80
39
Upsetting?

This example is upsetting as it appears to tell
you, as workers, to shirk.
Think of it from the managers point of view,
assuming you have unmotivated (or unhappy)
workers.
A better option would be to hire dedicated
workers, but if you have people who are trying to
cheat you, this gives a reasonable response.
Sometimes you are dealing with individuals who
just want to beat the system. In that case, you
need to play their game. For example, people who
try to beat the IRS.
On the positive side, even if you have dishonest
workers, if you get too paranoid about monitoring
their work, you lose! This theory tells you to
lighten up!
This theory might be applied to criticising your
friend or setting up rules/punishment for your
(future?) children.

40
Why Do We Mix?

I dont want to give my opponent an advantage.
When my opponent cant decide what to do based on
my strategy, I win as there is not way he is
going to take advantage of me.

COMMANDMENT Use the mixed strategy that keeps
your opponent guessing.
41
Mixed Strategy Equilibriums

Anyone for tennis?
Should you serve to the forehand or the backhand?

42
Tennis Payoffs
43
Tennis Fixed SumIf you win (the points), I lose
(the points)AKA Strictly competitive
q
1-q
p
1-p
44
Solving for Servers Optimal Mix

What would happen if the the server always served
to the forehand?
A rational receiver would always anticipate
forehand and 90 of the serves would be
successfully returned.

45
Solving for Servers Optimal Mix

What would happen if the the server aimed to the
forehand 50 of the time and the backhand 50 of
the time and the receiver always guessed
forehand?
(0.50.9) (0.50.2) 0.55 successful returns

46
Solving for Servers Optimal Mix

What is the best mix for each player?
Receiver thinks
if server serves forehand .10p .70(1-p)
if server serves backhand .80p .40(1-p)
I want them to be the same
.10p .70(1-p) .80p .40(1-p)
.10p .70 -.70p .80p .40 -.40p
-.6p.7 .4p .4
.3 p
Use similar argument to solve for q -

47
Draw a graph which shows two lines(1) the
utility of server of picking forehand as a
function of p. (2) the utility of server of
picking backhand as a function of p.
48
What can you learn from the graph?
49
Receivers view of opponentAbove 1/3, backhand
wins.
p
50
Receivers view of opponentAbove .3, serving
backhand wins.
p
51
Servers view of opponentAbove .4 plan forehand
wins
q
52
of Successful Returns Given Server and Receiver
Actions
Where would you shoot knowing the other
player will respond to your choices? In other
words, you pick the row but will likely get the
smaller value in a row.
53
Consider Bach or Stravinsky

If the other player is maximally mixing, my
payoffs are the same, so 2(Y) 1(1-Y) Y 1/3
1(X) 2 (1-X) X 2/3

Yolanda
B (Y)
S(1-Y)
2,1 0,0
0,0 1,2
B(X)
No dom. str. equil.
Xavier
S(1-X)
54
Best Response Function

If 0 lt Y lt 1/3, then player 1s best response is
X0.
If y 1/3, then ALL of player 1s responses are
best responses
If y gt 1/3, then player 1s best response is X1.
Using excel, prove this to yourself!

55
Best Response Function(The dotted line is a
function only if you mentally switch the axes.)
Y
Fixed Point where best response
functions intersect is the nash Equilibrium The
best response of player 1 is shown as a dotted
line.
1
1/3
X
2/3
1
56
p q player 1 player 2
0.1 0.1 0.83 1.63
0.1 0.2 0.76 1.46
0.1 0.3 0.69 1.29
0.1 0.4 0.62 1.12
0.1 0.5 0.55 0.95
0.1 0.6 0.48 0.78
0.1 0.7 0.41 0.61
0.1 0.7 0.41 0.61
0.1 0.8 0.34 0.44
0.1 0.9 0.27 0.27
0.1 1 0.2 0.1
p q player 1 player 2
0.1 0.33 0.669 1.24
0.2 0.33 0.668 1.14
0.3 0.33 0.667 1.04
0.4 0.33 0.666 0.94
0.5 0.33 0.665 0.84
0.6 0.33 0.664 0.73
0.7 0.33 0.663 0.63
0.7 0.33 0.663 0.63
0.8 0.33 0.662 0.53
0.9 0.33 0.661 0.43
1 0.33 0.66 0.33
p q player 1 player 2
0.67 0.1 0.4338 0.67
0.67 0.2 0.5336 0.67
0.67 0.3 0.6334 0.67
0.67 0.4 0.7332 0.67
0.67 0.5 0.833 0.67
0.67 0.6 0.9328 0.67
0.67 0.7 1.0326 0.67
0.67 0.7 1.0326 0.67
0.67 0.8 1.1324 0.67
0.67 0.9 1.2322 0.67
0.67 1 1.332 0.67
57
Hints to understanding graph

The solid line represents Yolandas thinking. If
Xavier is going to select B less than 2/3s of
the time, Yolanda is best selecting S (which
happens when Y0).
HOWEVER, if Xavier is going to select B more than
2/3s of the time, Xavier should immediately
start selecting S (which happens when y1).

58
Computing mixed stategies for two players (the
books way)

Write the matrix game in bi matrix form Aaij
Bbij
Compute payoffs
Replace pm 1- and qn 1-
Consider the partial derviatives of ?1 and ?2
with respect to all pi and all qi respectively.
Solve system of equations with all partials set
to zero

59
Example
?1 3 p1q1 p2q2 3p1q1 (1-p1)(1-q1) 1 -p1
q1 4p1q1 ?2 p1q1 4p2q2 p1q1
4(1-p1)(1-q1) 4 -4p1-4q1 5p1q1 d?1 /dp1 -1
4q1 so q1 ¼ d?2 /dq1 -4 5p1 so p1
4/5 So strategies are ((4/5,1/5)(¼, ¾))
60
Example 2
?1 3 p1q1 -p1q2 -2p2q1 p2q2 3 p1q1
-p1(1-q1) -2(1-p1)q1 (1-p1)(1-q1) 3p1q1 p1
p1q1 -2q1 2p1q1 1- p1 q1 p1q1 17p1q1-2p1-3
q1 ?2 p1q1 4p2q2 p1q1 4(1-p1)(1-q1) 4
-4p1-4q1 5p1q1 d?1 /dp1 -2 7q1 so q1
2/7 d?2 /dq1 -4 5p1 so p1 4/5 So strategies
are ((4/5,1/5)(2/7,5/7))
61
Tennis Example
?1 90 p1q1 20p1q2 30p2q1 60p2q2
90pq 20p(1-q) 30(1-p)q 60(1-p)(1-q)
90pq 20p-20pq 30q-30pq 60 -60p-60q60pq
60100pq -40p -30q ?2 10pq
80p(1-q)70(1-p)q40(1-p)(1-q) 10pq 80p
-80pq 70q-70pq40-40p-40q40pq -100pq 40p30q
40 d?1 /dp1 100q-40 so q .4 d?2 /dq1 -100p
30 so p .3 So strategies are ((.3, .7)(.4,
.6))

Write a Comment

User Comments (0)