Introduction%20to%20Boosting - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20Boosting

Description:

... distribution of weights over the N training points W(xi)=1. Initially assign uniform weights W0(x) = 1 ... Find best weak classifier Ck(x) using weights Wk(x) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 42
Provided by: QYa3
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20Boosting


1
Introduction to Boosting
  • Slides Adapted from Che Wanxiang(???) at HIT, and
    Robin Dhamankar of
  • Many thanks!

2
Ideas
  • Boosting is considered to be one of the most
    significant developments in machine learning
  • Finding many weak rules of thumb is easier than
    finding a single, highly prediction rule
  • Key in combining the weak rules

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Boosting(Algorithm)
  • W(x) is the distribution of weights over the N
    training points ? W(xi)1
  • Initially assign uniform weights W0(x) 1/N for
    all x, step k0
  • At each iteration k
  • Find best weak classifier Ck(x) using weights
    Wk(x)
  • With error rate ek and based on a loss function
  • weight ak the classifier Cks weight in the final
    hypothesis
  • For each xi , update weights based on ek to get
    Wk1(xi )
  • CFINAL(x) sign ? ai Ci (x)

16
Boosting (Algorithm)
17
(No Transcript)
18
(No Transcript)
19
Boosting As Additive Model
  • The final prediction in boosting f(x) can be
    expressed as an additive expansion of individual
    classifiers
  • The process is iterative and can be expressed as
    follows.
  • Typically we would try to minimize a loss
    function on the training examples

20
Boosting As Additive Model
  • Simple case Squared-error loss
  • Forward stage-wise modeling amounts to just
    fitting the residuals from previous iteration.
  • Squared-error loss not robust for classification

21
Boosting As Additive Model
  • AdaBoost for Classification
  • L(y, f (x)) exp(-y f (x)) - the exponential
    loss function

22
Boosting As Additive Model
First assume that ß is constant, and minimize
w.r.t. G
23
Boosting As Additive Model
errm It is the training error on the weighted
samples The last equation tells us that in each
iteration we must find a classifier that
minimizes the training error on the weighted
samples.
24
Boosting As Additive Model
Now that we have found G, we minimize w.r.t. ß
25
AdaBoost(Algorithm)
  • W(x) is the distribution of weights over the N
    training points ? W(xi)1
  • Initially assign uniform weights W0(x) 1/N for
    all x.
  • At each iteration k
  • Find best weak classifier Ck(x) using weights
    Wk(x)
  • Compute ek the error rate as
  • ek ? W(xi ) I(yi ? Ck(xi )) / ? W(xi
    )
  • weight ak the classifier Cks weight in the final
    hypothesis Set ak log ((1 ek )/ek )
  • For each xi , Wk1(xi ) Wk(xi ) expak I(yi
    ? Ck(xi ))
  • CFINAL(x) sign ? ai Ci (x)

26
AdaBoost(Example)
Original Training set Equal Weights to all
training samples
Taken from A Tutorial on Boosting by Yoav
Freund and Rob Schapire
27
AdaBoost(Example)
ROUND 1
28
AdaBoost(Example)
ROUND 2
29
AdaBoost(Example)
ROUND 3
30
AdaBoost(Example)
31
AdaBoost (Characteristics)
  • Why exponential loss function?
  • Computational
  • Simple modular re-weighting
  • Derivative easy so determing optimal parameters
    is relatively easy
  • Statistical
  • In a two label case it determines one half the
    log odds of P(Y1x) gt We can use the sign as
    the classification rule
  • Accuracy depends upon number of iterations ( How
    sensitive.. we will see soon).

32
Boosting performance
Decision stumps are very simple rules of thumb
that test condition on a single
attribute. Decision stumps formed the individual
classifiers whose predictions were combined to
generate the final prediction. The
misclassification rate of the Boosting algorithm
was plotted against the number of iterations
performed.
33
Boosting performance
Steep decrease in error
34
Boosting performance
  • Pondering over how many iterations would be
    sufficient.
  • Observations
  • First few ( about 50) iterations increase the
    accuracy substantially.. Seen by the steep
    decrease in misclassification rate.
  • As iterations increase training error decreases
    ? and generalization error decreases ?

35
Can Boosting do well if?
  • Limited training data?
  • Probably not ..
  • Many missing values ?
  • Noise in the data ?
  • Individual classifiers not very accurate ?
  • It cud if the individual classifiers have
    considerable mutual disagreement.

36
Application Data mining
  • Challenges in real world data mining problems
  • Data has large number of observations and large
    number of variables on each observation.
  • Inputs are a mixture of various different kinds
    of variables
  • Missing values, outliers and variables with
    skewed distribution.
  • Results to be obtained fast and they should be
    interpretable.
  • So off-shelf techniques are difficult to come up
    with.
  • Boosting Decision Trees ( AdaBoost or MART) come
    close to an off-shelf technique for Data Mining.

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
ATT May I help you?
41
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com