Arguments for Recovering Cooperation - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Arguments for Recovering Cooperation

Description:

Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner s dilemma: the game theory notion of rational action is wrong! – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 62
Provided by: klar162
Category:

less

Transcript and Presenter's Notes

Title: Arguments for Recovering Cooperation


1
Arguments for Recovering Cooperation
  • Conclusions that some have drawn from analysis of
    prisoners dilemma
  • the game theory notion of rational action is
    wrong!
  • somehow the dilemma is being formulated wrongly
  • This isnt rational. We may not defect for a few
    cents. If suckers payoff really hurts, more
    likely to be rational.

2
Arguments to recover cooperation
  • We are not all self-centered! But sometimes we
    are nice because there is a punishment. If we
    dont give up seat on bus, we receive rude
    stares.
  • If this were true, places like Honor Copy would
    be exploited.
  • The other prisoner is my twin! When I decide
    what to do, the other agent will do the same.
    (but cant force it, as wouldnt be autonomous).
  • Your mother would say, What if everyone were to
    behave like that? You say, I would be a fool
    to act any other way.
  • The shadow of the futurewe will meet again.

3
The Iterated Prisoners Dilemma
  • One answer play the game more than once
  • If you know you will be meeting your opponent
    again, then the incentive to defect appears to
    evaporate
  • Cooperation is the rational choice in the
    infinitely repeated prisoners dilemma(Hurrah!)

4
Backwards Induction
  • Butsuppose you both know that you will play the
    game exactly n timesOn round n - 1, you have an
    incentive to defect, to gain that extra bit of
    payoffBut this makes round n 2 the last
    real, and so you have an incentive to defect
    there, too.This is the backwards induction
    problem.
  • Playing the prisoners dilemma with a fixed,
    finite, pre-determined, commonly known number of
    rounds, defection is the best strategy

5
Axelrods Tournament
  • Suppose you play iterated prisoners dilemma
    against a range of opponentsWhat strategy
    should you choose, so as to maximize your overall
    payoff?
  • Axelrod (1984) investigated this problem, with a
    computer tournament for programs playing the
    prisoners dilemma

6
Axelrods tournament invited political
scientists, psychologists, economists, game
theoreticians to play iterated prisoners dilemma
  • All-D always defect
  • Random randomly pick a strategy
  • Tit-for-Tat On first round cooperate. Then do
    whatever your opponent did last.
  • Tester first defect, If the opponent ever
    retaliated, then use tit-for-tat. If the opponent
    did not defect, cooperate for two rounds, then
    defect.
  • Joss Tit-for-tat, but 10 of the time, defect
    instead of cooperating.

7
Tit-for-Tat
  • Why? Because you were averaging over all types
    of strategy
  • If you played only All-D, tit-for-tat would lose.

8
Two Trigger Strategies
  • Grim trigger strategy
  • Cooperate until a rival deviates
  • Once a deviation occurs,
    play non-cooperatively for the
    rest of the game
  • Tit-for-tat
  • Cooperate if your rival cooperated
    in the most recent period
  • Cheat if your rival cheated
    in the most recent period

9
Axelrod's rules for success
  • Do not be envious not necessary to beat your
    opponent in order to do well. This is not zero
    sum.
  • Do not be the first to defect. Be nice. Start by
    cooperating.
  • Retaliate appropriatelyAlways punish defection
    immediately, but use measured force dont
    overdo it
  • Dont hold grudgesAlways reciprocate
    cooperation immediately
  • do not be too clever
  • when you try to learn from the other agent, dont
    forget he is trying to learn from you.
  • Be forgiving one defect doesnt mean you can
    never cooperate
  • The opponent may be acting randomly

10
The centipede game
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
11
The centipede game
The solution to this game through roll back is
for Jack to stop in the first round!
Go on
Go on
Go on
Jack
Jill
Jill
Jack
Go on
stop
stop
stop
stop
(5, 3)
(4, 7)
(2, 0)
(1, 4)
Go on
Go on
Jill
Jill
Jack
(99, 99)
stop
stop
(98, 96)
(97, 100)
(94, 97)
12
The centipede game
  • What actually happens?
  • In experiments the game usually continues for at
    least a few rounds and occasionally goes all the
    way to the end.
  • But going all the way to the (99, 99) payoff
    almost never happens at some stage of the game
    cooperation breaks down.
  • So still do not get sustained cooperation even if
    move away from roll back as a solution

13
Lessons from finite repeated games
  • Finite repetition often does not help players to
    reach better solutions
  • Often the outcome of the finitely repeated game
    is simply the one-shot Nash equilibrium repeated
    again and again.
  • There are SOME repeated games where finite
    repetition can create new equilibrium outcomes.
    But these games tend to have special properties
  • For a large number of repetitions, there are some
    games where the Nash equilibrium logic breaks
    down in practice.

14
Threats
  • Threatening retaliatory actions may help gain
    cooperation
  • Threat needs to be believable

15
What is Credibility?
  • The difference between genius and stupidity
    is that genius has its limits.
  • Albert Einstein
  • You are not credible if you propose to take
    suboptimal actions.
  • If a rational actor
  • proposes to play a strategy
  • which earns suboptimal profit.
  • How can one be credible?

16
non-credible threat
  • A non-credible threat is a threat made by a
    player in a Sequential Game which would not be in
    the best interest for the player to carry out.
    The hope is that the threat is believed in which
    case there is no need to carry it out. While Nash
    equilibria may depend on non-credible threats,
    Backward Induction eliminates them.

17
Trigger Strategy Extremes
  • Tit-for-Tat is
  • most forgiving
  • shortest memory
  • proportional
  • credible but lacks deterrence
  • Tit-for-tat answers
  • Is cooperation easy?
  • Grim trigger is
  • least forgiving
  • longest memory
  • MAD
  • adequate deterrence but lacks credibility
  • Grim trigger answers
  • Is cooperation possible?

18
concepts of rationality doing the rational thing
  • undominated strategy
  • (problem too weak) cant always find a
    single one
  • (weakly) dominating strategy (alias duh?)
  • (problem too strong, rarely exists)
  • Nash equilibrium (or double best response)
  • (problem equilibrium may not exist)
  • randomized (mixed) Nash equilibrium players
    choose various options based on some random
    number (assigned via a probability)
  • Theorem Nash 1952 randomized Nash Equilibrium
    always exists.

. . .
19
Mixed strategy equilibria
  • ?i(sj)) is the probability player i selects
    strategy sj
  • (0,0,1,0,0) is a pure strategy
  • Strategy profile ?(?1,, ?n)
  • Expected utility ui(?)?s?S(?j ?(sj))ui(s)
  • (chance the combination occurs times utility)
  • Nash Equilibrium
  • ? is a (mixed) Nash equilibrium if

?i??i defines a probability distribution over Si
ui(?i, ?-i)?ui(?i, ?-i) for all ?i??i, for all
i
20
Example Matching Penniesno pure strategy Nash
Equilibrium
H
T
-1, 1 1,-1
1,-1 -1, 1
H
T
So far we have talked only about pure strategy
equilibria I make one choice.. Not all games
have pure strategy equilibria. Some equilibria
are mixed strategy equilibria.
21
Example Matching Pennies
q H
1-q T
-1, 1 1,-1
1,-1 -1, 1
p H
1-p T
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent between his
own choices! Compute expected utility given
each pure possibility of other player.
22
I am player 2. What should I do?I pick a
defensive strategy
  • If player1 picks head
  • -q(1-q)
  • If Player 1 picks tails
  • q -(1-q)
  • Want my opponent NOT to care what I pick. The
    idea is, if my opponent gets excited about what
    my strategy is, it means I have left open an
    opportunity for him. When he doesnt have to
    analyze what he should do, it says there is no
    way he wins big.
  • So
  • -q (1-q) q -1q
  • 1-2q2q-1 so q1/2

23
Example Bach/Stravinsky
q B
1-q S
2, 1 0,0
0,0 1, 2
p B
1-p S
Want to play each strategy with a certain
probability. If player 2 is optimally mixing
strategies, player 1 is indifferent to what
player1 does. Compute expected utility given
each pure possibility of yours.
player 1 is optimally mixing
player 2 is optimally mixing
24
  • This is consistent with Dans advice look after
    yourself.
  • I Used to Think I Was Indecisive
  • - But Now Im Not So Sure
  • Anonymous


25
Mixed Strategies
  • Unreasonable predictors of one-time human
    interaction
  • Reasonable predictors of long-term proportions

26
Employee Monitoring
  • Employees can work hard or shirk
  • Salary 100K unless caught shirking
  • Cost of effort 50K
  • (We are assuming that when he works he loses
    something. Think of him running a business of
    his own while getting paid as his day job so if
    he works, he cant do that and loses the money
    the business makes.)
  • Managers can monitor or not
  • Value of employee output 200K
  • (We assume he must be worth more than we pay
    him to cover profit, infrastructure, manager
    time, mistakes, etc.)
  • Profit if employee doesnt work 0
  • Cost of monitoring 10K

27
Employee Monitoring
Manager
Monitor No Monitor
Employee Work 50 , 90 50 , 100
Employee Shirk 0 , -10 100 , -100
  • From the problem statement, VERIFY the numbers in
    the table are correct.
  • No equilibrium in pure strategies - SHOW IT
  • What do the players do in mixed strategies? DO AT
    SEATS
  • Please do not consider this instruction for how
    to cheat your boss. Rather, think of it as
    advice in how to deal with employees.

28
Mixed Strategies
  • Randomize surprise the rival
  • Mixed Strategy
  • Specifies that an actual move be chosen randomly
    from the set of pure strategies with some
    specific probabilities.
  • Nash Equilibrium in Mixed Strategies
  • A probability distribution for each player
  • The distributions are mutual best responses to
    one another in the sense of expectations

29
Finding Mixed Strategies
  • Suppose
  • Employee chooses (shirk, work) with
    probabilities (p,1-p)
  • Manager chooses (monitor, no monitor) with
    probabilities (q,1-q)
  • Find expected payoffs for each player
  • Use these to calculate best responses

30
Employees Payoff
  • First, find employees expected payoff from each
    pure strategy
  • If employee works receives 50
  • Profit(work) 50 ?q 50 ?(1-q)
  • 50
  • If employee shirks receives 0 or 100
  • Profit(shirk) 0 ?q 100 ?(1-q)
  • 100 100q

31
Employees Best Response
  • Next, calculate the best strategy for possible
    strategies of the opponent
  • For qlt1/2 SHIRK
  • Profit(shirk) 100-100q gt 50 Profit(work)
    SHIRK
  • For qgt1/2 WORK
  • Profit(shirk) 100-100q lt 50 Profit(work)
    WORK
  • For q1/2 INDIFFERENT
  • Profit(shirk) 100-100q 50 Profit(work) ????

32
Managers Best Response
  • u2(mntr) 90 ?(1-p) - 10 ?p
  • u2(no m) 100 ?(1-p) -100 ?p
  • For plt1/10 NO MONITOR
  • u2 (mntr) 90-100p lt 100-200p u2(no mntr)
  • For pgt1/10 MONITOR
  • u2(mntr) 90-100p gt 100-200p u2(no mntr)
  • For p1/10 INDIFFERENT
  • u2(mntr) 90-100p 100-200p u2(no mntr)

33
Cycles
1
shirk
p
1/10
work
0
0
1
1/2
q
monitor
no monitor
34
Mixed Strategy Equilibrium
  • Employees shirk with probability 1/10
  • Managers monitor with probability ½
  • Expected payoff to employee
  • chance of each of four outcomes x
    payoff from each
  • Expected payoff to manager

35
Properties of Equilibrium
  • Both players are indifferent between any mixture
    over their strategies
  • E.g. employee
  • If shirk
  • If work
  • Regardless of what employee does, expected payoff
    is the same

36
Use Indifference to Solve I
q 1-q
Monitor No Monitor
Work 50, 90 50 , 100 50q50(1-q)
Shirk 0, -10 100 , -100 0q100(1-q)
  • 50q50(1-q) 0q100(1-q)
  • 50 100-100q
  • 50 100q
  • q 1/2

37
Use Indifference to Solve II
Monitor No Monitor
1-p Work 50 , 90 50 , 100
p Shirk 0 , -10 100 , -100
90(1-p)-10p 100(1-p)-100p
  • 90(1-p)-10p 100(1-p)-100p
  • 90-100p 100 200p
  • 100p 10
  • p 1/10

38
Indifference
1/2 1/2
Monitor No Monitor
9/10 Work 50 , 90 50 , 100 50
1/10 Shirk 0 , -10 100 , -100 50
80 80
39
Upsetting?
  • This example is upsetting as it appears to tell
    you, as workers, to shirk.
  • Think of it from the managers point of view,
    assuming you have unmotivated (or unhappy)
    workers.
  • A better option would be to hire dedicated
    workers, but if you have people who are trying to
    cheat you, this gives a reasonable response.
  • Sometimes you are dealing with individuals who
    just want to beat the system. In that case, you
    need to play their game. For example, people who
    try to beat the IRS.
  • On the positive side, even if you have dishonest
    workers, if you get too paranoid about monitoring
    their work, you lose! This theory tells you to
    lighten up!
  • This theory might be applied to criticising your
    friend or setting up rules/punishment for your
    (future?) children.

40
Why Do We Mix?
  • I dont want to give my opponent an advantage.
    When my opponent cant decide what to do based on
    my strategy, I win as there is not way he is
    going to take advantage of me.

COMMANDMENT Use the mixed strategy that keeps
your opponent guessing.
41
Mixed Strategy Equilibriums
  • Anyone for tennis?
  • Should you serve to the forehand or the backhand?

42
Tennis Payoffs
43
Tennis Fixed SumIf you win (the points), I lose
(the points)AKA Strictly competitive
q
1-q
p
1-p
44
Solving for Servers Optimal Mix
  • What would happen if the the server always served
    to the forehand?
  • A rational receiver would always anticipate
    forehand and 90 of the serves would be
    successfully returned.

45
Solving for Servers Optimal Mix
  • What would happen if the the server aimed to the
    forehand 50 of the time and the backhand 50 of
    the time and the receiver always guessed
    forehand?
  • (0.50.9) (0.50.2) 0.55 successful returns

46
Solving for Servers Optimal Mix
  • What is the best mix for each player?
  • Receiver thinks
  • if server serves forehand .10p .70(1-p)
  • if server serves backhand .80p .40(1-p)
  • I want them to be the same
  • .10p .70(1-p) .80p .40(1-p)
  • .10p .70 -.70p .80p .40 -.40p
  • -.6p.7 .4p .4
  • .3 p
  • Use similar argument to solve for q -

47
Draw a graph which shows two lines(1) the
utility of server of picking forehand as a
function of p. (2) the utility of server of
picking backhand as a function of p.
48
What can you learn from the graph?
49
Receivers view of opponentAbove 1/3, backhand
wins.
p
50
Receivers view of opponentAbove .3, serving
backhand wins.
p
51
Servers view of opponentAbove .4 plan forehand
wins
q
52
of Successful Returns Given Server and Receiver
Actions
Where would you shoot knowing the other
player will respond to your choices? In other
words, you pick the row but will likely get the
smaller value in a row.
53
Consider Bach or Stravinsky
  • If the other player is maximally mixing, my
    payoffs are the same, so 2(Y) 1(1-Y) Y 1/3
  • 1(X) 2 (1-X) X 2/3

Yolanda
B (Y)
S(1-Y)
2,1 0,0
0,0 1,2
B(X)
No dom. str. equil.
Xavier
S(1-X)
54
Best Response Function
  • If 0 lt Y lt 1/3, then player 1s best response is
    X0.
  • If y 1/3, then ALL of player 1s responses are
    best responses
  • If y gt 1/3, then player 1s best response is X1.
  • Using excel, prove this to yourself!

55
Best Response Function(The dotted line is a
function only if you mentally switch the axes.)
Y
Fixed Point where best response
functions intersect is the nash Equilibrium The
best response of player 1 is shown as a dotted
line.
1
1/3
X
2/3
1
56
p q player 1 player 2
0.1 0.1 0.83 1.63
0.1 0.2 0.76 1.46
0.1 0.3 0.69 1.29
0.1 0.4 0.62 1.12
0.1 0.5 0.55 0.95
0.1 0.6 0.48 0.78
0.1 0.7 0.41 0.61
0.1 0.7 0.41 0.61
0.1 0.8 0.34 0.44
0.1 0.9 0.27 0.27
0.1 1 0.2 0.1
p q player 1 player 2
0.1 0.33 0.669 1.24
0.2 0.33 0.668 1.14
0.3 0.33 0.667 1.04
0.4 0.33 0.666 0.94
0.5 0.33 0.665 0.84
0.6 0.33 0.664 0.73
0.7 0.33 0.663 0.63
0.7 0.33 0.663 0.63
0.8 0.33 0.662 0.53
0.9 0.33 0.661 0.43
1 0.33 0.66 0.33
p q player 1 player 2
0.67 0.1 0.4338 0.67
0.67 0.2 0.5336 0.67
0.67 0.3 0.6334 0.67
0.67 0.4 0.7332 0.67
0.67 0.5 0.833 0.67
0.67 0.6 0.9328 0.67
0.67 0.7 1.0326 0.67
0.67 0.7 1.0326 0.67
0.67 0.8 1.1324 0.67
0.67 0.9 1.2322 0.67
0.67 1 1.332 0.67
57
Hints to understanding graph
  • The solid line represents Yolandas thinking. If
    Xavier is going to select B less than 2/3s of
    the time, Yolanda is best selecting S (which
    happens when Y0).
  • HOWEVER, if Xavier is going to select B more than
    2/3s of the time, Xavier should immediately
    start selecting S (which happens when y1).

58
Computing mixed stategies for two players (the
books way)
  • Write the matrix game in bi matrix form Aaij
    Bbij
  • Compute payoffs
  • Replace pm 1- and qn 1-
  • Consider the partial derviatives of ?1 and ?2
    with respect to all pi and all qi respectively.
  • Solve system of equations with all partials set
    to zero

59
Example
?1 3 p1q1 p2q2 3p1q1 (1-p1)(1-q1) 1 -p1
q1 4p1q1 ?2 p1q1 4p2q2 p1q1
4(1-p1)(1-q1) 4 -4p1-4q1 5p1q1 d?1 /dp1 -1
4q1 so q1 ¼ d?2 /dq1 -4 5p1 so p1
4/5 So strategies are ((4/5,1/5)(¼, ¾))
60
Example 2
?1 3 p1q1 -p1q2 -2p2q1 p2q2 3 p1q1
-p1(1-q1) -2(1-p1)q1 (1-p1)(1-q1) 3p1q1 p1
p1q1 -2q1 2p1q1 1- p1 q1 p1q1 17p1q1-2p1-3
q1 ?2 p1q1 4p2q2 p1q1 4(1-p1)(1-q1) 4
-4p1-4q1 5p1q1 d?1 /dp1 -2 7q1 so q1
2/7 d?2 /dq1 -4 5p1 so p1 4/5 So strategies
are ((4/5,1/5)(2/7,5/7))
61
Tennis Example
?1 90 p1q1 20p1q2 30p2q1 60p2q2
90pq 20p(1-q) 30(1-p)q 60(1-p)(1-q)
90pq 20p-20pq 30q-30pq 60 -60p-60q60pq
60100pq -40p -30q ?2 10pq
80p(1-q)70(1-p)q40(1-p)(1-q) 10pq 80p
-80pq 70q-70pq40-40p-40q40pq -100pq 40p30q
40 d?1 /dp1 100q-40 so q .4 d?2 /dq1 -100p
30 so p .3 So strategies are ((.3, .7)(.4,
.6))
Write a Comment
User Comments (0)
About PowerShow.com