Title: Introduction to Boosting
1Introduction to Boosting
- Hojung Cho
- Topics for Bioinformatics
- Oct 10 2006
2Boosting
- Underlying principle
-
- While building a highly accurate prediction rule
is not an easy task, it is not hard to come up
with very rough rules of thumb (weak learners)
that are only moderately accurate and to combine
these into a highly accurate classifier. - Outline
- The boosting framework
- Choice of a
- AdaBoost
- LogiBoost
- References
3The Rules for Boosting
- 1) set all weights of training examples equal
- 2) train a weak learner on the weighted examples
- 3) see how well the weak learner performs on data
and give it a weight based on how well it did - 4) re-weight training examples and repeat
- 5) when done, predict by voting by majority
Weak learner rough and moderately inaccurate
predictor, but one that can predict better than
chance (1/2) - gt Boosting shows the strength of
weak learnability
- Two fundamental questions for designing the
Boosting algorithm - How should each distribution or weighting (subset
of examples) be chosen on each round? - place the most weight on the examples most often
misclassified by the preceding weak rules - forcing the weak learner to focus on the
hardest examples - How should the weak learners be combined into a
single rule? - take a weighted majority vote of their
predictions - choice of a analytically or numerically
4A Boosting approach
AdaBoost
5Simple example
6Choice of a
- Schapire and Singer proved that the training
error is bounded by
From the theorem above, We can derive
7Proof
8AdaBoost
9Boosting and additive logistic regression
(Friedman et al, 2000)
- Boosting an approximation to additive modeling
on the logistic scale using maximum Bernoulli
(binomial in multiclass case) likelihood as a
criterion. - Propose more direct approximations that exhibit
nearly identical results to boosting (AdaBoost). - Reduce computation.
10The probability of y 1 when the f(x) is the
weighted average of the basic classifiers in
AdaBoost is represented by p(x),
. Note than the close connection between the log
loss (negative log likelihood)of a model above,
and the function we attempt to minimize in
AdaBoost,
For any distribution over pairs(x,y), both the
expectations
are minimized by the function f,
Rather than minimizing the exponential loss, we
can attempt to directly minimize the logistic
loss (the negative log likelihood) LogitBoost.
11LogitBoost
12References
- Yoav Fruend and Robert E Schapire. A
decision-theoretic generalization of the on-line
learning and an application to boosting. Journal
of Computer and System Sciences, 55(1)119-139,
August 1997. - Ron Meir and Gunnar Rätsch. An introduction to
boosting and leveraging. In Advanced Lectures on
Machine Learning (LNAI2600), 2003. - Robert E. Schapire. The boosting approach to
machine learning An overview. In D. D. Denison,
M. H. Hansen, C. Holmes, B. Mallick, B. Yu,
editors, Nonlinear Estimation and Classification.
Springer, 2003.