Title: A Brief Introduction to Adaboost
1A Brief Introduction to Adaboost
Some of the slides are borrowed from Derek Hoiem
Jan ?Sochman.
2Outline
- Background
- Adaboost Algorithm
- Theory/Interpretations
3Whats So Good About Adaboost
- Can be used with many different classifiers
- Improves classification accuracy
- Commonly used in many areas
- Simple to implement
- Not prone to overfitting
4A Brief History
Resampling for estimating statistic
- Bootstrapping
- Bagging
- Boosting (Schapire 1989)
- Adaboost (Schapire 1995)
Resampling for classifier design
5Bootstrap Estimation
- Repeatedly draw n samples from D
- For each set of samples, estimate a statistic
- The bootstrap estimate is the mean of the
individual estimates - Used to estimate a statistic (parameter) and its
variance
6Bagging - Aggregate Bootstrapping
- For i 1 .. M
- Draw nltn samples from D with replacement
- Learn classifier Ci
- Final classifier is a vote of C1 .. CM
- Increases classifier stability/reduces variance
D2
D1
D3
D
7Boosting (Schapire 1989)
- Consider creating three component classifiers for
a two-category problem through boosting. - Randomly select n1 lt n samples from D without
replacement to obtain D1 - Train weak learner C1
- Select n2 lt n samples from D with half of the
samples misclassified by C1 to obtain D2 - Train weak learner C2
- Select all remaining samples from D that C1 and
C2 disagree on - Train weak learner C3
- Final classifier is vote of weak learners
D
D3
D1
D2
-
-
8Adaboost - Adaptive Boosting
- Instead of resampling, uses training set
re-weighting - Each training sample uses a weight to determine
the probability of being selected for a training
set. - AdaBoost is an algorithm for constructing a
strong classifier as linear combination of
simple weak classifier - Final classification based on weighted vote of
weak classifiers
9Adaboost Terminology
- ht(x) weak or basis classifier (Classifier
Learner Hypothesis) - strong or final
classifier - Weak Classifier lt 50 error over any
distribution - Strong Classifier thresholded linear combination
of weak classifier outputs
10Discrete Adaboost Algorithm
Each training sample has a weight, which
determines the probability of being selected for
training the component classifier
11Find the Weak Classifier
12Find the Weak Classifier
13The algorithm core
14Reweighting
y h(x) 1
y h(x) -1
15Reweighting
In this way, AdaBoost focused on the
informative or difficult examples.
16Reweighting
In this way, AdaBoost focused on the
informative or difficult examples.
17Algorithm recapitulation
t 1
18Algorithm recapitulation
19Algorithm recapitulation
20Algorithm recapitulation
21Algorithm recapitulation
22Algorithm recapitulation
23Algorithm recapitulation
24Algorithm recapitulation
25Pros and cons of AdaBoost
- Advantages
- Very simple to implement
- Does feature selection resulting in relatively
simple classifier - Fairly good generalization
- Disadvantages
- Suboptimal solution
- Sensitive to noisy data and outliers
26References
- Duda, Hart, ect Pattern Classification
- Freund An adaptive version of the boost by
majority algorithm - Freund Experiments with a new boosting
algorithm - Freund, Schapire A decision-theoretic
generalization of on-line learning and an
application to boosting - Friedman, Hastie, etc Additive Logistic
Regression A Statistical View of Boosting - Jin, Liu, etc (CMU) A New Boosting Algorithm
Using Input-Dependent Regularizer - Li, Zhang, etc Floatboost Learning for
Classification - Opitz, Maclin Popular Ensemble Methods An
Empirical Study - Ratsch, Warmuth Efficient Margin Maximization
with Boosting - Schapire, Freund, etc Boosting the Margin A
New Explanation for the Effectiveness of Voting
Methods
27Appendix
- Bound on training error
- Adaboost Variants
28Bound on Training Error (Schapire)
29Discrete Adaboost (DiscreteAB)(Friedmans
wording)
30Discrete Adaboost (DiscreteAB)(Freund and
Schapires wording)
31Adaboost with Confidence Weighted Predictions
(RealAB)
32Adaboost Variants Proposed By Friedman
- LogitBoost
- Solves
- Requires care to avoid numerical problems
- GentleBoost
- Update is fm(x) P(y1 x) P(y0 x) instead
of - Bounded 0 1
33Adaboost Variants Proposed By Friedman
34Adaboost Variants Proposed By Friedman
35Thanks!!!Any comments or questions?