Title: Efficient learning algorithms for changing environments
1Efficient learning algorithms for changing
environments
- Elad Hazan and C. Seshadhri
- (IBM Almaden)
2The online learning setting
G1
G2
GT
3The online setting
f1(x1)
f1
x1
x2
f2(x2)
f2
xT
fT(xT)
fT
- Convex bounded functions
- Total loss ?t ft(xt)
- Adversary chooses any function from family
4Regret
f2
f1
fT
x
minima for ?t ft(x) (fixed optimum in hindsight)
- Loss of our algorithm ?t ft(xt)
- Regret ?t ft(xt) - ?t ft(x) (Standard notion
of performance) - Continuum of experts
- Online learning problem - design efficient
algorithms that attain low regret
5Sublinear Regret
- Loss per round converges to optimal
- Obviously, cant compete with best set of points
6Portfolio Management
Loss
- HKKA Efficient algorithms that give O(log T)
regret - (Much smaller than usual O(vT) regret)
7Convergence behaviour
x
xT
x1
- As t increases, xt xt1 decreases
- As t increases, learning decreases?
- Does not adapt to environment
8Adapting with time
f (1, ½)
f (½,1)
- Optimal fixed portfolio is (½, ½) put equal
money on both stocks - Low regret algorithms will converge to this
- But this is terrible!
- We want algorithm to make a switch!
- Cannot happen with convergence behaviour
9Something better than regret?
- Littlestone-Warmuth, Herbster-Warmuth,
Bousquet-Warmuth study k-shifting optima - Finite expert setting
- Freund-Schapire-Singer-Warmuth Sleeping experts
- Lehrer, Blum-Mansour Time selection functions
10Adaptive Regret
x1
x2
x3
xT
J
f3
fT
f2
f1
11Adaptive Regret
x1
x2
x3
xT
J
f3
fT
f2
f1
Adaptive Regret
- Max regret over all intervals
- Different optimum xJ for every interval J
- Captures movement of optimum as time progresses
- We want Adaptive Regret o(T)
- In any interval of size ?(AR), algorithm
converges to optimum
12Results
- We want efficient algorithms to get low
Adaptive-Regret for Portfolio Management - Normal regret can be as low as O(log T)
- Can we get Adaptive-Regret close to that?
- We will deal with a larger class of problems and
give general results
13FLH
- We will describe algorithm Follow-the-Leading-Hist
ory - It uses standard low-regret algorithms as black
box - Bootstrapping procedure - convert low regret into
low adaptive regret efficiently - Done by streaming technique
14And now for something completely different
- For exp-concave setting (e.g. square loss,
portfolio management) HKKA
15Other work
- Auer-Cesa Bianchi-Freund-Schapire, Zinkevich,
Y. Singer - Kozat-A. Singer independent work in DSP
community - k-shifting results for portfolio management
- We give more different technique
16Study your history!
T
f2
f3
ft
f1
Room of experts
HKKA from f2
HKKA from f3
HKKA from ft
HKKA from f1
xt
17Who to choose?
ft
HKKA from f1
HKKA from f2
HKKA from f3
HKKA from ft
Multiplicative update based on Herbster Warmuth
Losses of all experts
- Weight wi for each expert (probabilities)
- Choose according to this
- After ft is revealed
- wi updated with a multiplicative factor, and then
mix with uniform distribution
18Running time problem
FTL from f2
FTL from f3
FTL from ft
FTL from f1
J
- Regret in J is O(log T)
- Adaptive Regret O(log T)
- But ?(T) experts needed
- Running time O(RT) since we runs ?(T) FTLs!!
19Removing experts
Working set
- Stream through experts
- We remove experts
- Once removed, they are banished forever
- Working set is very dynamic
20Working set
t
in St
- St working set at time t
- Subset of t
- Properties
- St1 \ St t1
- St O(log t)
- Well spread out
Woodruff Elegant deterministic construction
Rule on who to throw out from St to get St1
t
21And therefore
- Working set always of size O(log T)
- Running time for each step is only O(R log T)
- We get O(log2 T) Adaptive Regret with O(log T)
copies of original low regret algorithm
22To summarize
- Defined Adaptive-Regret, a generalization of
regret that captures moving solutions - Low Adaptive-Regret means we converge to fixed
optimum in every interval - Gave bootstrapping algorithm that converts low
regret into low Adaptive-Regret (almost optimal) - For (say) portfolio management, what is the right
history to look at?
23Further directions
- Can streaming/sublinear ideas be used for
efficiency? - Applications to learning scenarios with cost of
shifting - Maybe this technique can be used for online
algorithms - Competitive ratio instead of regret
- What kind of competitive ratio can these learning
techniques give?
24Thanks!
- No, we didnt make/lose any money playing the
stock market with this algorithmyet.
25Tree update problem
Universe n
at
at
Binary search tree Bt on n
Loss cost of accessing at in Bt
26Tree update problem
Universe n
Binary search tree Bt on n
27Tree update problem
Universe n
Rotations
Binary search tree Bt1
- Total cost Total access cost Total rotation
cost - Sleator-Tarjan Splay trees are O(1)-competetive
- Conjecture
28Tree update problem
Given sequence a1, a2,, aT
Binary search tree B
- Total cost Total access cost Total rotation
cost - Regret Total cost Total cost of B o(T)
- Regret o(cost of B)
- Static optimality
29For tree update
- Given query sequence a1, a2, , aT , let OPT be
cost of best tree - KV FTL based approach gives
- Total cost (1 1 / vT) OPT
- Given contiguous sequence J of queries, OPTJ is
cost of best tree for J - We get
- Cost for J (1 1 / T1/4 ) OPTJ T3/4
30Square Loss
Loss
xt
xt1
yt
- Have to pay xt - xt1
- Get competitive ratio bounds?
31Being lazy
- Do we have to update decision every round?
- Could be expensive - tree update problem
- We can be lazy, and only do total of m updates
- But pay regret T/m
- Used to get low Adaptive-Regret for tree update
problem
32Study your history!
T
f2
f3
ft
f1
Room of experts
FTL from f2
FTL from f3
FTL from ft
FTL from f1
xt
33Running time
FTL from f2
FTL from f3
FTL from ft
FTL from f1
- Adaptive Regret O(log T)
- But ?(T) experts needed
- Running time O(RT) since we runs ?(T) FTLs!!
34Removing experts
Working set
- Stream through experts
- We remove experts
- Once removed, they are banished forever
- Working set is very dynamic
35Working set
t
in St
- St working set at time t
- Subset of t
- Properties
- St1 \ St t1
- St O(log t)
- Well spread out
t
36Maintaining experts
i
t
- Woodruff Elegant deterministic construction
- Rule on who to throw out from St to get St1
- Completely combinatorial working set
37And therefore
- We get O(log2 T) Adaptive Regret with O(log T)
copies of original low regret algorithm - Same ideas for general convex functions
- Different math though!
- regret with O(log T) copies