NoRegret Algorithms for Online Convex Programs - PowerPoint PPT Presentation

About This Presentation
Title:

NoRegret Algorithms for Online Convex Programs

Description:

At each trial we must pick a hypothesis yi. Correct answer revealed in the form of a convex loss ... For example, the simplex of probability distributions ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 22
Provided by: nicolasc9
Category:

less

Transcript and Presenter's Notes

Title: NoRegret Algorithms for Online Convex Programs


1
No-Regret Algorithms for Online Convex Programs
  • Geoffrey J. Gordon
  • Carnegie Mellon University
  • Presented by Nicolas Chapados
  • 21 February 2007

2
Outline
  • Online learning setting
  • Definition of Regret
  • Safe Set
  • Lagrangian Hedging (gradient form)
  • Lagrangian Hedging (optimization form)
  • Mention of Theoretical Results
  • Application One-Card Poker

3
Online Learning
  • Sequence of trials 1, 2,
  • At each trial we must pick a hypothesis yi
  • Correct answer revealed in the form of a convex
    loss function lt(yt)
  • Just before seeing t-th example, total loss is
    given by

4
Goal of Paper
  • Introduce Lagrangian Hedging algorithm
  • Generalization of other algorithms
  • Hedge (Freund and Schapire)
  • Weighted Majority (Littlestone and Warmuth)
  • External-regret Matching (Hart and Mas-Colell)
  • (CMU Technical Report is much clearer than NIPS
    paper)

5
Regret
  • If we had used a fixed hypothesis y, the loss
    would have been
  • The regret is the difference between the total
    loss of the adaptive and fixed hypotheses
  • Positive regret means that we should have
    preferred the fxed hypothesis

6
Hypothesis Set
  • Assume that hypothesis set Y is a convex subset
    of Rd
  • For example, the simplex of probability
    distributions
  • The corners of Y represent pure actions and the
    middle region a probability distribution over
    actions

7
Loss Function
  • Minimize a linear loss

8
Regret Vector
  • Keep the state of the learning algorithm
  • Vector that keeps information about actual losses
    and gradient of loss function
  • Define regret vector st by the recursion
  • Arbitrary vector u which satisfiesfor all
  • Example if y is a probability, then u can be the
    vector of all ones.

9
Use of Regret Vector
  • Given any hypothesis y, we can use the regret
    vector to compute its regret

10
Safe Set
  • Region of the regret space in which the regret is
    guaranteed to be nonpositive for all hypotheses
  • Goal of the Lagrangian Hedging algorithm is to
    keep its regret vector  near  the safe set

11
Safe Set (continued)
Hypothesis set Y
Safe Set S
12
Unnormalized Hypotheses
  • Consider the cone of unnormalized hypotheses
  • The safe set is a cone that is polar to this cone
    of unnormalized hypotheses

13
Lagrangian Hedging (Setting)
  • At each step, the algorithm chooses its play
    according to the current regret vector and a
    closed convex potential function F(s)
  • Define (sub)gradient of F(s) as f(s)
  • Potential function is what defines the problem to
    be solved
  • E.g. Hedge / Weighted Majority

14
Lagrangian Hedging (Gradient)
15
Optimization Form
  • In practice, may be difficult to define, evaluate
    and differentiate an appropriate potential
    function
  • Optimization form same pseudo-code as
    previously, but define F in terms of a simpler
    hedging function W
  • Example corresponding to previous F1

16
Optimization Form (contd)
  • Then may obtain F as
  • And the (sub)gradient as
  • Which we may plug into the previous pseudo-code

17
Theoretical Results(In a nutshell it all works)
18
One-Card Poker
  • Hypothesis space is the set of sequence weight
    vectors
  • information about when it is player is turn to
    move and the actions available at that time
  • Two players gambler and dealer
  • Ante 1 / given 1 card from 13-card deck
  • Gambler Bets / Dealer Bets / Gambler Bets
  • A player may fold
  • If neither folds player with highest card wins
    pot

19
Why is it interesting?
  • Elements of more complicated games
  • Incomplete information
  • Chance events
  • Multiple stages
  • Optimal play requires randomization and bluffing

20
Results in Self-Play
21
Results Against Fixed Opponent
Write a Comment
User Comments (0)
About PowerShow.com