Boosting and Additive Trees (Part 1) - PowerPoint PPT Presentation

About This Presentation
Title:

Boosting and Additive Trees (Part 1)

Description:

Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum Overview Ensemble methods and motivations Describing Adaboost.M1 algorithm Show that Adaboost ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 24
Provided by: Schoolo202
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Boosting and Additive Trees (Part 1)


1
Boosting and Additive Trees(Part 1)
  • Ch. 10
  • Presented by Tal Blum

2
Overview
  • Ensemble methods and motivations
  • Describing Adaboost.M1 algorithm
  • Show that Adaboost maximizes the exponential loss
  • Other loss functions for classification and
    regression

3
Ensemble Learning Additive Models
  • INTUITION Combining Predictions of an ensemble
    is more accurate than a single classifier.
  • Justification ( Several reasons)
  • easy to find quite correct rules of thumb
    however hard to find single highly accurate
    prediction rule.
  • If the training examples are few and the
    hypothesis space is large then there are several
    equally accurate classifiers. (model uncertainty)
  • Hypothesis space does not contain the true
    function, but a linear combination of hypotheses
    might.
  • Exhaustive global search in the hypothesis space
    is expensive so we can combine the predictions of
    several locally accurate classifiers.
  • Examples Bagging, HME, Splines

4
Boosting (explaining)
5
Example
learning curve for Y 1 if ? X2j gt
?210(0.5) 0 otherwise
6
Adaboost.M1 Algorithn
  • W(x) is the distribution of weights over the N
    training points ? W(xi)1
  • Initially assign uniform weights W0(x) 1/N for
    all x.
  • At each iteration k
  • Find best weak classifier Ck(x) using weights
    Wk(x)
  • Compute ek the error rate as
  • ek ? W(xi ) I(yi ? Ck(xi )) / ? W(xi
    )
  • weight ak the classifier Cks weight in the final
    hypothesis Set ak log ((1 ek )/ek )
  • For each xi , Wk1(xi ) Wk(xi ) expakI(yi
    ? Ck(xi ))
  • CFINAL(x) sign ? ak Ck (x)

7
Boosting asan Additive Model
  • The final prediction in boosting f(x) can be
    expressed as an additive expansion of individual
    classifiers
  • The process is iterative and can be expressed as
    follows.
  • Typically we would try to minimize a loss
    function on the training examples

8
Forward Stepwise Additive Modeling - algorithm
  • Initialize f0(x)0
  • For m 1 to M
  • Compute
  • Set

9
Forward Stepwise Additive Modeling
  • Sequentially adding new basis functions without
    adjusting the parameters of the previously chosen
    functions
  • Simple case Squared-error loss
  • Forward stage-wise modeling amounts to just
    fitting the residuals from previous iteration.
  • Squared-error loss not robust for classification

10
Exponential Lossand Adaboost
  • AdaBoost for Classification
  • L(y, f (x)) exp(-y f (x)) - the exponential
    loss function

11
Exponential Lossand Adaboost
  • Assuming ? ? 0

12
Finding the best ?
13
(No Transcript)
14
Historical Notes
  • Adaboost was first presented in ML theory as a
    way to boost a week classifier
  • At first people thought it defies the no free
    lunch theorem and doesnt overfitt.
  • Connection between Adaboost and stepwise additive
    modeling was only recently discovered.

15
Why Exponential Loss?
  • Mainly Computational
  • Derivatives are easy to compute
  • Optimal classifiers minimizes the weighted sample
  • Under mild assumptions the instances weights
    decrease exponentially fast.
  • Statistical
  • Exp. loss is not necessary for success of
    boosting On Boosting and exponential loss
    (Wyner)
  • We will see in the next slides

16
Why Exponential Loss?
  • Population minimizer (Friedman 2000)
  • This justifies using its sign as a classification
    rule.

17
Why Exponential Loss?
  • For exponential loss
  • Interpreting f as a
  • logit transform
  • The population maximizers and
    are the same

18
Loss Functions and Robustness
  • For a finite dataset exp. loss and binomial
    deviance are not the same.
  • Both criterion are monotonic decreasing functions
    of the margin.
  • Examples with negative margin yf(x)lt0 are
    classified incorrectly.

19
Loss Functions and Robustness
  • The problem Classification error is not
    differentiable and with derivative 0 where it is
    differentiable.
  • We want a criterion which is efficient and as
    close as possible to the true classification
    lost.
  • Any loss criterion used for classification should
    give higher weights to misclassified examples.
  • Therefore the square loss function is not
    appropriate for classification.

20
Loss Functions and Robustness
  • Both functions can be though of as a continuous
    approximation to the misclassification loss
  • Exponential lost grows exponentially fast for
    instances with high margin
  • Such instances weight increases exponentially
  • This makes Adaboost very sensitive to mislabeled
    examples
  • Deviation generalizes to K classes, exp loss not.

21
Robust Loss FunctionsFor Regression
  • The relationship between square loss and absolute
    loss is analogous to that of exp. loss and
    deviance.
  • The solutions are the mean and median.
  • Absolute loss is more robust.
  • For regression MSE leads to Adaboost for
    regression
  • For Gaussian errors and robustness to outliers
  • Huber loss

22
Sample of UCI datasets Comparison
Dataset Name J48 J48bagging(10) Adaboost\w Decision stumps SVMSMO B Net NB NN 1 LBMA LBMADEVIANCE
colic 85.1 82.8 81.08 78.38 78.37 79.73 77 85.1 82.43
anneal(70) 96.6 97.4 84.07 97.04 92.22 91.8 94.07 97 97.04
credit-a(x10) 84.49 86.67 85.94 85.65 85.07 85.36 80 86.22 84.06
iris-(disc5)x10 93.3 94 87.3 94 93.3 93.3 93.3 94.67 94
soybean-9x2 84.87 79.83 27.73 86.83 83.19 84.59 80.11 87.36 87.68
soybean-37 90.51 85.4 24.09 93.43 90.51 88.32 82.48 92.7 94.16
labor-(disc5) 70.18 78.95 87.82 87.72 94.74 91.23 85.96 94.74 94.74
autos-(disc5)x2 70.73 64.39 44.88 73.17 61.95 61.46 77.07 65.35 76.1
credit-g(70) 74.33 73.67 74.33 74.67 77 76.67 67.67 74.33 76.67
glassx5 57.94 56.54 42.06 57.94 56.54 54.67 55.14 58.41 57.48
diabetes 68.36 68.49 71.61 70.18 70.31 69.92 64.45 68 69.4
audiology 76.55 76.55 46.46 80.97 75.22 71.24 73.45 79.6 80.09
breast-cancer 74.13 68.18 72.38 69.93 72.03 72.73 68.18 75.52 76.22
heart-c-disc 77.56 81.19 84.49 83.17 84.16 83.83 76.57 80.21 84.16
vowel x 5 71.92 71.92 17.97 86.46 63.94 63.94 90.7 94.04 93.84
Average 78.44 77.732 62.1473 81.3 79 77.9 77.74 82.22 83.205
23
Next Presentation
Write a Comment
User Comments (0)
About PowerShow.com