Title: Lecturer: Moni Naor
1Algorithmic Game Theory Uri Feige Robi
Krauthgamer Moni NaorLecture 8 Regret
Minimization
2Announcements
- Next Week (Dec 24th)
- Israeli Seminar on Computational Game Theory
(1000 - 430) - At Microsoft Israel RD Center,
- 13 Shenkar St. Herzliya.
- January course will be 1300-1500
- The meetings on Jan 7th, 14th and 21st 2009
3The Peanuts Game
- There are n bins.
- At each round nature throws a peanut into one
of the bins - If you (the player) are at the chosen bin you
catch the peanut - Otherwise you miss it
- You may choose to move to any bin before any
round - Game ends when d peanuts were thrown at some bin
- Independent of whether they were caught or not
- Goal to catch as many peanuts as possible.
- Hopeless if the opponent is tossing the peanuts
based on knowing where you stand. - Say sequence of bins is predetermined (but
unknown). - How well can we do as a function of d and n?
4Basic setting
- View learning as a sequence of trials.
- In each trial, algorithm is given x, asked to
predict f, and then is told the correct value. - Make no assumptions about how examples are
chosen. - Goal is to minimize number of mistakes.
Focus on number of mistakes. Need to learn from
our mistakes.
5Using expert advice
Want to predict the stock market.
- We solicit n experts for their advice will the
market go up or down tomorrow? - Want to use their advice to make our prediction.
E.g.,
- What is a good strategy for using their opinions,
- No a priori knowledge which expert is the best?
- Expert someone with an opinion.
6Some expert is perfect
- We have n experts.
- One of these is perfect (never makes a mistake).
We just dont know which one. - Can we find a strategy that makes no more than
log n mistakes?
Simple algorithm take majority vote over all
experts that have been completely correct so far.
What if we have a prior p over the experts.
Want no more than log(1/pi) mistakes, where
expert i is the perfect one? Take weighted vote
according to p.
7Relation to concept learning
- If computation time is no object, can have one
expert per concept in C. - If target in C, then number of mistakes at most
logC. - More generally, for any description language,
number of mistakes is at most number of bits to
write down f.
8What if no expert is perfect?
- Goal is to do nearly as well as the best one in
hindsight. - Strategy 1
- Iterated halving algorithm. Same as before, but
once we've crossed off all the experts, restart
from the beginning. - Makes at most log(n)OPT mistakes,
- OPT is mistakes of the best expert in
hindsight. - Seems wasteful constantly forgetting what we've
learned.
x
log n
log n
log n
log n
9Weighted Majority Algorithm
- Intuition Making a mistake doesn't completely
disqualify an expert. So, instead of crossing
off, just lower its weight. - Weighted Majority Alg
- Start with all experts having weight 1.
- Predict based on weighted majority vote.
- Penalize mistakes by cutting weight in half.
10Weighted Majority Algorithm
- Weighted Majority Alg
- Start with all experts having weight 1.
- Predict based on weighted majority vote.
- Penalize mistakes by cutting weight in half.
- Example
Day 1
Day 2
Day 3
Day 4
11Analysis not far from best expert in hindsight
- M mistakes we've made so far.
- b mistakes best expert has made so far.
- W total weight. Initially W is set to n.
- After each mistake W drops by at least 25.
- So, after M mistakes, W is at most n(3/4)M.
- Weight of best expert is (1/2)b. So
- (1/2)b n(3/4)M
- and
- M ?(blog n)
constant comp. ratio
12Randomized Weighted Majority
- If the best expert makes a mistake 20 of the
time, then ?(blog n) not so good Can we do
better? - Instead of majority vote use weights as
probabilities. - If 70 on up, 30 on down, then pick 7030
- Idea smooth out the worst case.
- Also, generalize ½ to 1- e.
13Randomized Weighted Majority
- / Initialization /
- Wi ? 1 for i 2 1..n
- / Main loop /
- For t1.. T
- Let Pt(i) Wi/(?j1n Wj) .
- Choose i according to Pt
-
- /Update Scores /
- Observe losses
- for i 2 1..n
- Wi ? Wi (1-?)loss(i)
14Analysis
- Say at time t fraction Ft of weight on experts
that made mistake. - We have probability Ft of making a mistake
remove an eFt fraction of the total weight. - Wfinal n(1-e F1)(1 - e F2)...
- ln(Wfinal) ln(n) ?t ln(1 - e Ft) ln(n) -
e ?t Ft - (using ln(1-x) lt -x)
- ln(n) - e M.
(Ã¥ Ft E mistakes) - If best expert makes b mistakes, then ln(Wfinal)
gt ln((1-?)b). - Now solve ln(n) - e M gt b ln(1-e)
- M b ln(1-e)/(-e) ln(n) /e
- b (1e) ln(n) /e .
- M Expected mistakes
- b mistakes best expert made
- W total weight.
-ln(1-x) -xx2 for 0 x 1/2
15Summarizing RWM
- Can be (1?)-competitive with best expert in
hindsight, with additive ?-1log(n). - If running T time steps, set ? (ln n/T)1/2 to
get - M b (1 (ln n/T)1/2) ln(n) / (ln n/T)1/2
- b (b2ln n/T)1/2 ) (ln(n) T)1/2
- b 2(ln(n) T)1/2
- M mistakes made
- b mistakes best expert made
- M b (1e) ln(n) /e
additive loss
16Questions
- Isnt it better to sometimes take the majority?
- The best expert may have a hard time on the easy
question and we would be better of using the
wisdom of the crowds - Answer if it a good idea, make the majority
expert one of the experts!
17Lower Bounds
- Cannot hope to do better than
- log n
- T1/2
18What can we use this for?
- Can use to combine multiple algorithms to do
nearly as well as best in hindsight. - E.g., online auctions one expert per price
level. - Play repeated game to do nearly as well as best
strategy in hindsight Regret Minimization - Extensions bandit problem, movement costs.
19No-regret algorithms for repeated games
- Repeated play of matrix game with N rows.
(Algorithm is row-player, rows represent
different possible actions). - At each step t algorithm picks row life picks
column. - Alg pays cost for action chosen. Ct(it)
- Alg gets column as feedback Ct
- or just its own cost in the bandit model).
- Assume bound on max cost all costs between 0 and
1.
it
Ct
20No-regret algorithms for repeated games
-
- At each time step, algorithm picks row, life
picks column. - Alg pays cost for action chosen Ct(i)
- Alg gets column as feedback
- Assume bound on max cost all costs between 0 and
1.
Define average regret in T time steps as
avg cost of alg avg cost of best fixed row
in hindsight 1/T ?t1T Ct(it) - mini ?t1T
Ct(i) Want this to go to 0 or better as T gets
large no-regret algorithm.
21Randomized Weighted Majority
- / Initialization /
- Wi ? 1 for i 2 1..n
- / Main loop /
- For t1.. T
- Let Pt(i) Wi/(?j1n Wj) .
- Choose i according to Pt
-
- /Update Scores /
- Observe column Ct
- for i 2 1..n
- Wi ? Wi (1-?)Ct(i)
i
Ct
22Analysis
- Similar to 0,1 case.
- Ecost of RWM
- (mini ?t1T Ct(i) )/(1-e) ln(n)/e
- (mini ?t1T Ct(i) (12e)ln(n)/e
- No Regret as T grows difference goes to 0
For e ½
23Properties of no-regret algorithms.
- Time-average performance guaranteed to approach
minimax value V of game - or better, if life isnt adversarial.
- Two NR algorithms playing against each other
- have empirical distribution approaching minimax
optimal. - Existence of no-regret algorithms yields proof
of minimax theorem.
24von Neumans Minimax Theorem
- Zero-sum game u2(a1,a2) -u1(a1,a2)
- Theorem
- For any two-player zero sum game with finite
strategy set A1, A2 there is a value v 2 R, the
game value, s.t. - v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
- minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
- For all mixed Nash equilibria (p,q) u1(p,q)v
?(A) mixed strategies over A
25Convergence to Minimax
- Suppose we know
- v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
- minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
- Consider distribution q for player 2 observed
frequencies of player 2 for T steps - There is best response x 2 A1 for q so that
u2(x,q) v - If player 1 always plays x then expected gain is
vT - If player 1 follows a no-regret procedure
- loss is at most vT R where R/T ? 0
- Using RWM average loss is v O(log n/T)1/2)
26Proof of the Minimax
- Want to show
- v maxp 2 ?(A1) minq 2 ?(A2) u1(p,q)
- minq 2 ?(A2) maxp 2 ?(A1) u1(p,q)
- Consider for player 1
- v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
- v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
- For player 2
- v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
- v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
- Need to prove v1max v1min . Easy v1max v1min
- Suppose v1max v1min ? for ? 0
- Player 1 and 2 follow a no-regret procedure for T
steps with regret R - Need R/T lt ?/2
Best choice given player 2 distribution
Best distribution not given player 2 distribution
27Proof of the Minimax
- Consider for player 1
- v1max maxx 2 A1 minq 2 ?(A2) u1(x,q)
- v1min miny 2 A2 maxp 2 ?(A1) u1(p,y)
- For player 2
- v2max maxy 2 A2 minp 2 ?(A2) -u1(p,y)
- v2min minx 2 A1 maxq 2 ?(A1) -u1(x,q)
- Suppose v1max v1min ? for ? 0
- Player 1 and 2 follow a no-regret procedure for T
steps with regret R - Need R/T lt ?/2
- Losses are LT and -LT.
- Let L1 and L2 be best response losses for the
empirical distributions - Then L1 /T v1max and L2 /T v2max
- But L1 LT R and L2 - LT - R
28History and development
- Hannan57, Blackwell56 Alg. with regret
O((N/T)1/2). - Need T O(N/?2) steps to get time-average regret
?. - Call this quantity T?
- Optimal dependence on T (or ?). View N as
constant - Learning-theory 80s-90s combining expert
advice - Perform (nearly) as well as best f2C. View N as
large. - LittlestoneWarmuth89 Weighted-majority
algorithm - Ecost OPT(1e) (log N)/e OPT?T(log N)/?
- Regret O((log N)/T)1/2.
- T? O((log N)/?2).
29Why Regret Minimization?
- Finding Nash equilibria can be computationally
difficult - Not clear that agents would converge to it, or
remain in one if there are several - Regret minimization is realistic
- There are efficient algorithms that minimize
regret - It is locally computed,
- Players improve by lowering regret
30Efficient implicit implementation for large n
- Bounds have only log dependence on n.
- So, conceivably can do well when n is exponential
in natural problem size, if only could implement
efficiently. - E.g., case of paths
- Recent years series of results giving efficient
implementation/alternatives in various settings
31The Evesdropping Game
- Let G(V,E)
- Player 1 chooses and edge e of E
- Player 2 chooses a spanning tree T
- Payoff u1(e,T) 1 if e 2 T and 0 otherwise
- The number of moves exponential in G
- But best response for Player 2 given
distribution on edges - solve a minimum spanning tree on the
probabilities
32Correlated Equilibrium and Swap Regret
- What about Nash?
- For correlated equilibrium if algorithm has low
swap regret then converges to correlated
equilibrium.
33Back to the Peanuts Game
- There are n bins.
- At each round nature throws a peanut into one
of the bins - If you (the player) are at the chosen bin you
catch the peanut - Otherwise you miss it
- You may choose to move to any bin before any
round - Game ends when d peanuts were thrown at some bin
- Homework what guarantees can you provide