Introduction to Boosting

About This Presentation

Title:

Introduction to Boosting

Description:

Boosting: an approximation to additive modeling on the logistic scale using ... Propose more direct approximations that exhibit nearly identical results to ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 13

Provided by: hojun

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Boosting

1
Introduction to Boosting

Hojung Cho
Topics for Bioinformatics
Oct 10 2006

2
Boosting

Underlying principle
While building a highly accurate prediction rule
is not an easy task, it is not hard to come up
with very rough rules of thumb (weak learners)
that are only moderately accurate and to combine
these into a highly accurate classifier.
Outline
The boosting framework
Choice of a
AdaBoost
LogiBoost
References

3
The Rules for Boosting

1) set all weights of training examples equal
2) train a weak learner on the weighted examples
3) see how well the weak learner performs on data
and give it a weight based on how well it did
4) re-weight training examples and repeat
5) when done, predict by voting by majority

Weak learner rough and moderately inaccurate
predictor, but one that can predict better than
chance (1/2) - gt Boosting shows the strength of
weak learnability

Two fundamental questions for designing the
Boosting algorithm
How should each distribution or weighting (subset
of examples) be chosen on each round?
place the most weight on the examples most often
misclassified by the preceding weak rules
forcing the weak learner to focus on the
hardest examples
How should the weak learners be combined into a
single rule?
take a weighted majority vote of their
predictions
choice of a analytically or numerically

4
A Boosting approach
AdaBoost
5
Simple example
6
Choice of a

Schapire and Singer proved that the training
error is bounded by

From the theorem above, We can derive
7
Proof
8
AdaBoost
9
Boosting and additive logistic regression
(Friedman et al, 2000)

Boosting an approximation to additive modeling
on the logistic scale using maximum Bernoulli
(binomial in multiclass case) likelihood as a
criterion.
Propose more direct approximations that exhibit
nearly identical results to boosting (AdaBoost).
Reduce computation.

10
The probability of y 1 when the f(x) is the
weighted average of the basic classifiers in
AdaBoost is represented by p(x),
. Note than the close connection between the log
loss (negative log likelihood)of a model above,
and the function we attempt to minimize in
AdaBoost,
For any distribution over pairs(x,y), both the
expectations
are minimized by the function f,
Rather than minimizing the exponential loss, we
can attempt to directly minimize the logistic
loss (the negative log likelihood) LogitBoost.
11
LogitBoost
12
References

Yoav Fruend and Robert E Schapire. A
decision-theoretic generalization of the on-line
learning and an application to boosting. Journal
of Computer and System Sciences, 55(1)119-139,
August 1997.
Ron Meir and Gunnar Rätsch. An introduction to
boosting and leveraging. In Advanced Lectures on
Machine Learning (LNAI2600), 2003.
Robert E. Schapire. The boosting approach to
machine learning An overview. In D. D. Denison,
M. H. Hansen, C. Holmes, B. Mallick, B. Yu,
editors, Nonlinear Estimation and Classification.
Springer, 2003.