Title: A speech about Boosting
1A speech about Boosting
- Presenter Roberto Valenti
2The Paper
R.Schapire. The boosting approach to Machine
Learning An Overview, 2001
3I want YOU
TO UNDERSTAND
4Overview
- Introduction
- Adaboost
- How Does it work?
- Why does it work?
- Demo
- Extensions
- Performance Applications
- Summary Conclusions
- Questions
5Introduction to Boosting
6Introduction
- An example of Machine Learning Spam classifier
- Highly accurate rule difficult to find
- Inaccurate rule BUY NOW
- Introducing Boosting
- An effective method of producing an accurate
prediction rule from inaccurate rules
7Introduction
- History of boosting
- 1989 Schapire
- First provable polynomial time boosting
- 1990 Freund
- Much more efficient, but practical drawbacks
- 1995 Freund Schapire
- Adaboost Focus of this Presentation
8Introduction
- The Boosting Approach
- Lots of Weak Classifiers
- One Strong Classifier
- Boosting key points
- Give importance to misclassified data
- Find a way to combine weak classifiers in general
rule.
9Adaboost
10Adaboost How does it work?
11Adaboost How does it work?
- Base Learner Job
- Find a base Hypothesis
- Minimize the error
- Choose at
-
12Adaboost How does it work?
13Adaboost
14Adaboost Why does it work?
- Basic property reduce the training error
- On binary Distributions
- e 1/2 - gt
- Training error bounded by
- Is at most e-2Tg2 -gtdrops exponentially!
15Adaboost Why does it work?
- Generalization Error bounded by
- T number of iterations
- msample size
- d Vapnik-Chervonenkis dimension2
- Pr . empirical probability
- Õ Logarithmic and constant factors
- Overfitting in T!
16Adaboost Why does it work?
- Margins of the training examples
- margin(x,y)
- Positive only if correctly classified by H
- Confidence in prediction
- Qualitative Explanation of Effectiveness
- Not Quantitative.
17Adaboost Other View
- Adaboost as a zero-sum Game
- Game matrix M
- Row Player Adaboost
- Column Player Base Learner
- Row player plays rows with distribution P
- Column player plays with distribution Q
- Expected Loss PTMQ
- Play a Repeated game Matrix
18Adaboost Other View
- Von Neumanns minmax theorem
- If exist a classifier with e lt1/2 - g
- Then exist a combination of base classifiers with
margin gt 2g - Adaboost has potential of success
- Relations with Linear Programming and Online
Learning
19Adaboost
20Demo
21Adaboost
22Adaboost - Extensions
- History of Boosting
-
- 1997 Freund Schapire
- Adaboost.M1
- First Multiclass Generalization
- Fails if weak learner achieves less than 50
- Adaboost.M2
- Creates a set of binary problems
- For x, better l1 or l2?
- 1999 Schapire Singer
- Adaboost.MH
- For x, better l1 or one of the others?
23Adaboost - Extensions
- 2001 Rochery, Schapire et al.
- Incorporating Human Knowledge
- Adaboost is data-driven
- Human Knowledge can compensate lack of data
- Human expert
- Chose rule p mapping x to p(x) ? 0,1
- Difficult!
- Simple rules should work..
24Adaboost - Extensions
- To incorporate human knowledge
- Where
- RE(pq)p ln(p/q)(1-p) ln((1-p)/(1-q))
25Adaboost
- Performance and Applications
26Adaboost - Performance Applications
Error Rates on Text categorization
Reuters newswire articles
AP newswire headlines
27Adaboost - Performance Applications
Six Class Text Classification (TREC)
Test Error
Training Error
28Adaboost - Performance Applications
Spoken Language Classification
How may I help you
Help desk
29Adaboost - Performance Applications
OCR Outliers
Rounds
4
12
25
class, label1/weight1,label2/weight2
30Adaboost - Applications
- Text filtering
- Schapire, Singer, Singhal. Boosting and Rocchio
applied to text filtering.1998 - Routing
- Iyer, Lewis, Schapire, Singer, Singhal. Boosting
for document routing.2000 - Ranking problems
- Freund, Iyer, Schapire, Singer. An efficient
boostingalgorithm for combining preferences.1998 - Image retrieval
- Tieu, Viola. Boosting image retrieval.2000
- Medical diagnosis
- Merler, Furlanello, Larcher, Sboner. Tuning
costsensitive boosting and its application to
melanoma diagnosis.2001
31Adaboost - Applications
- Learning problems in natural language processing
- Abney, Schapire, Singer. Boosting applied to
tagging and PP attachment.1999 - Collins. Discriminative reranking for natural
language parsing.2000 - Escudero, Marquez, Rigau. Boosting applied to
word sense disambiguation.2000 - Haruno, Shirai, Ooyama. Using decision trees to
construct a practical parser.1999 - Moreno, Logan, Raj. A boosting approach for
confidence scoring.2001 - Walker, Rambow, Rogati. SPoT A trainable
sentence planner.2001
32Summary and Conclusions
33Summary
- Boosting takes a weak learner and converts it to
a strong one - Works by asymptotically minimizing the training
error - Effectively maximizes the margin of the combined
hypothesis - Adaboost is related to other many topics
- It Works!
34Conclusions
- Adaboost advantages
- Fast, simple and easy to program
- No parameter required
- Performance Dependency
- (Skurichina, 2001) Boosting is only useful for
large sample size. - Choice of weak classifier
- Incorporation of classifier weights
- Data distribution
35Questions
(dont be mean)