Efficient learning algorithms for changing environments - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Efficient learning algorithms for changing environments

Description:

We want efficient algorithms to get low Adaptive-Regret for Portfolio Management ... It uses standard low-regret algorithms as black box ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 38

Provided by: sesha6

Category:

more less

Transcript and Presenter's Notes

Title: Efficient learning algorithms for changing environments

1
Efficient learning algorithms for changing
environments

Elad Hazan and C. Seshadhri
(IBM Almaden)

2
The online learning setting
G1
G2
GT
3
The online setting
f1(x1)
f1
x1
x2
f2(x2)
f2
xT
fT(xT)
fT

Convex bounded functions
Total loss ?t ft(xt)
Adversary chooses any function from family

4
Regret
f2
f1
fT
x
minima for ?t ft(x) (fixed optimum in hindsight)

Loss of our algorithm ?t ft(xt)
Regret ?t ft(xt) - ?t ft(x) (Standard notion
of performance)
Continuum of experts
Online learning problem - design efficient
algorithms that attain low regret

5
Sublinear Regret

We want

Why?

Loss per round converges to optimal
Obviously, cant compete with best set of points

6
Portfolio Management
Loss

HKKA Efficient algorithms that give O(log T)
regret
(Much smaller than usual O(vT) regret)

7
Convergence behaviour
x
xT
x1

As t increases, xt xt1 decreases
As t increases, learning decreases?
Does not adapt to environment

8
Adapting with time
f (1, ½)
f (½,1)

Optimal fixed portfolio is (½, ½) put equal
money on both stocks
Low regret algorithms will converge to this
But this is terrible!
We want algorithm to make a switch!
Cannot happen with convergence behaviour

9
Something better than regret?

Littlestone-Warmuth, Herbster-Warmuth,
Bousquet-Warmuth study k-shifting optima
Finite expert setting
Freund-Schapire-Singer-Warmuth Sleeping experts
Lehrer, Blum-Mansour Time selection functions

10
Adaptive Regret
x1
x2
x3
xT
J
f3
fT
f2
f1
11
Adaptive Regret
x1
x2
x3
xT
J
f3
fT
f2
f1
Adaptive Regret

Max regret over all intervals
Different optimum xJ for every interval J
Captures movement of optimum as time progresses
We want Adaptive Regret o(T)
In any interval of size ?(AR), algorithm
converges to optimum

12
Results

We want efficient algorithms to get low
Adaptive-Regret for Portfolio Management
Normal regret can be as low as O(log T)
Can we get Adaptive-Regret close to that?
We will deal with a larger class of problems and
give general results

13
FLH

We will describe algorithm Follow-the-Leading-Hist
ory
It uses standard low-regret algorithms as black
box
Bootstrapping procedure - convert low regret into
low adaptive regret efficiently
Done by streaming technique

14
And now for something completely different

For exp-concave setting (e.g. square loss,
portfolio management) HKKA

15
Other work

Auer-Cesa Bianchi-Freund-Schapire, Zinkevich,
Y. Singer
Kozat-A. Singer independent work in DSP
community
k-shifting results for portfolio management
We give more different technique

16
Study your history!
T
f2
f3
ft
f1
Room of experts
HKKA from f2
HKKA from f3
HKKA from ft
HKKA from f1
xt
17
Who to choose?
ft
HKKA from f1
HKKA from f2
HKKA from f3
HKKA from ft
Multiplicative update based on Herbster Warmuth
Losses of all experts

Weight wi for each expert (probabilities)
Choose according to this
After ft is revealed
wi updated with a multiplicative factor, and then
mix with uniform distribution

18
Running time problem
FTL from f2
FTL from f3
FTL from ft
FTL from f1
J

Regret in J is O(log T)
Adaptive Regret O(log T)
But ?(T) experts needed
Running time O(RT) since we runs ?(T) FTLs!!

19
Removing experts
Working set

Stream through experts
We remove experts
Once removed, they are banished forever
Working set is very dynamic

20
Working set
t
in St

St working set at time t
Subset of t
Properties
St1 \ St t1
St O(log t)
Well spread out

Woodruff Elegant deterministic construction
Rule on who to throw out from St to get St1
t
21
And therefore

Working set always of size O(log T)
Running time for each step is only O(R log T)
We get O(log2 T) Adaptive Regret with O(log T)
copies of original low regret algorithm

22
To summarize

Defined Adaptive-Regret, a generalization of
regret that captures moving solutions
Low Adaptive-Regret means we converge to fixed
optimum in every interval
Gave bootstrapping algorithm that converts low
regret into low Adaptive-Regret (almost optimal)
For (say) portfolio management, what is the right
history to look at?

23
Further directions

Can streaming/sublinear ideas be used for
efficiency?
Applications to learning scenarios with cost of
shifting
Maybe this technique can be used for online
algorithms
Competitive ratio instead of regret
What kind of competitive ratio can these learning
techniques give?

24
Thanks!

No, we didnt make/lose any money playing the
stock market with this algorithmyet.

25
Tree update problem
Universe n
at
at
Binary search tree Bt on n
Loss cost of accessing at in Bt
26
Tree update problem
Universe n
Binary search tree Bt on n
27
Tree update problem
Universe n
Rotations
Binary search tree Bt1

Total cost Total access cost Total rotation
cost
Sleator-Tarjan Splay trees are O(1)-competetive
Conjecture

28
Tree update problem
Given sequence a1, a2,, aT
Binary search tree B

Total cost Total access cost Total rotation
cost
Regret Total cost Total cost of B o(T)
Regret o(cost of B)
Static optimality

29
For tree update

Given query sequence a1, a2, , aT , let OPT be
cost of best tree
KV FTL based approach gives
Total cost (1 1 / vT) OPT
Given contiguous sequence J of queries, OPTJ is
cost of best tree for J
We get
Cost for J (1 1 / T1/4 ) OPTJ T3/4

30
Square Loss
Loss
xt
xt1
yt

Have to pay xt - xt1
Get competitive ratio bounds?

31
Being lazy

Do we have to update decision every round?
Could be expensive - tree update problem
We can be lazy, and only do total of m updates
But pay regret T/m
Used to get low Adaptive-Regret for tree update
problem

32
Study your history!
T
f2
f3
ft
f1
Room of experts
FTL from f2
FTL from f3
FTL from ft
FTL from f1
xt
33
Running time
FTL from f2
FTL from f3
FTL from ft
FTL from f1

Adaptive Regret O(log T)
But ?(T) experts needed
Running time O(RT) since we runs ?(T) FTLs!!

34
Removing experts
Working set

Stream through experts
We remove experts
Once removed, they are banished forever
Working set is very dynamic

35
Working set
t
in St

St working set at time t
Subset of t
Properties
St1 \ St t1
St O(log t)
Well spread out

t
36
Maintaining experts
i
t

Woodruff Elegant deterministic construction
Rule on who to throw out from St to get St1
Completely combinatorial working set

37
And therefore

We get O(log2 T) Adaptive Regret with O(log T)
copies of original low regret algorithm
Same ideas for general convex functions
Different math though!
regret with O(log T) copies

Write a Comment

User Comments (0)