Ofer Dekel, Shai ShalevShwartz, Yoram Singer

About This Presentation

Title:

Ofer Dekel, Shai ShalevShwartz, Yoram Singer

Description:

Loss functions used in classification Boosting: ... GD and EG online algorithms for Log-loss. Relative loss bounds. Future Directions ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 15

Provided by: ofe75

Category:

more less

Transcript and Presenter's Notes

Title: Ofer Dekel, Shai ShalevShwartz, Yoram Singer

1
Smooth e-Insensitive Regression by Loss
Symmetrization

Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer
School of Computer Science and Engineering
The Hebrew University
oferd,shais,singer_at_cs.huji.ac.il
COLT 2003 The Sixteenth Annual Conference on
Learning Theory

2
Before We Begin
Linear Regression given find
such that Least Squares minimize Support
Vector Regression minimize s.t.
3
Loss Symmetrization
Loss functions used in classification Boosting
Symmetric versions of these losses can be used
for regression
4
A General Reduction

Begin with a regression training set
where ,
Generate 2m classification training examples of
dimension n1
Learn while maintaining
by minimizing a margin-based classification loss

5
A Batch Algorithm

An illustration of a single batch iteration
Simplifying assumptions (just for the demo)
Instances are in
Set
Use the Symmetric Log-loss

6
A Batch Algorithm
Calculate discrepancies and weights
43210
0 1 2 3
4
7
A Batch Algorithm
Cumulative weights
0 1 2 3
4
8
Two Batch Algorithms
Update the regressor
43210
Log-Additive update
0 1 2 3
4
9
Progress Bounds
Theorem (Log-Additive update) Theorem
(Additive update) Lemma Both bounds are
non-negative and equal zero only at the optimum
10
Boosting Regularization

A new form of regularization for regression and
classification Boosting
Can be implemented by addingpseudo-examples
Communicated by Rob Schapire

where
11
Regularization Contd.

Regularization ? Compactness of the feasible set
for
Regularization ? A unique attainable optimizer of
the loss function

?
Proof of Convergence
Progress compactness uniqueness asymptotic
convergence to the optimum
12
Exp-loss vs. Log-loss

Two synthetic datasets

Log-loss Exp-loss
13
Extensions

Parallel vs. Sequential updates
Parallel - update all elements of in parallel
Sequential - update the weight of a single weak
regressor on each round (like classic boosting)
Another loss function the Combined Loss

Log-loss
Exp-loss
Comb-loss
14
On-line Algorithms