MCS 2005 Round Table

About This Presentation

Title:

MCS 2005 Round Table

Description:

... always outperform any coverage optimization based MCS, like bagging and boosting. ... deduce the type MCS (boosting, bagging, random forests, boosting, bites, etc. ... – PowerPoint PPT presentation

Number of Views:11

Avg rating:3.0/5.0

Slides: 26

Provided by: philipke

Category:

more less

Transcript and Presenter's Notes

Title: MCS 2005 Round Table

1
MCS 2005 Round Table

In the context of MCS, what do you believe to be
true, even if you cannot yet prove it?

2
Plan of Attack

Three rough categories of response
Performance Claims (8)
Design and Data Principles (6)
Predictions for the Field (3)
Plan for each category
Quickly present a hypothesis. Show of hands to
see how many also believe that claim.
Discuss the most contentious. Why the disparity?
What experiments or evidence would help?
Discuss the most supported. Why still unproved?
What experiments or evidence would help?
(Disclaimers we re-wrote a couple entries to
make them into hypotheses. Categories are indeed
rough, and we may have misrepresented your
claim. I know were in America, but please dont
sue us.)

3
Performance Claims

4
Performance Claims/1

Combiners will generally perform better than
dimensionality reduction

5
Performance Claims/2

One can always find an ensemble of classifiers
which is more accurate than a single classifier,

6
Performance Claims/3

Multiple types of classifiers (eg DT, NN, SVM)
can be used in an ensemble/expert systems
approach to achieve less overall error. The
different approaches would hopefully have errors
on different examples than on any one approach
would have, due to the different biases,

7
Performance Claims/4

In complex real applications, the combination of
a small set of carefully designed and engineered
classifiers can always outperform any coverage
optimization based MCS, like bagging and
boosting.

8
Performance Claims/5

Classifier selection can improve bagging
performance, that is, a small subset of bagged
classifiers can perform better than a large one.

9
Performance Claims/6

The output of an ensemble is more stable than
that of a single classifier. Although there is
quite a bit of literature regarding this, the
notion of stability may be context specific.
Hence not possible to make a general statement
about. If an ensemble is indeed more stable and
hence more trustworthy, it provides a good
argument for designing an ensemble as opposed to
a single well-trained classifier even if the
classifier provides a good accuracy and the
ensemble does not improve accuracy significally.

10
Performance Claims/7

Fixed combiners perform well compared to
trainable combiners. But
Fixed combiners are based on a number of a priori
assumptions
Trainable combiners should be able to learn real
patterns in outputs of base classifiers
Still, fixed combiners however do surprisingly
well. (Why?)

11
Performance Claims/8

We have shown this experimentally, however cannot
prove theoretically --- that ensemble systems can
be use for incremental learning. The difficulty
lies with the fact that the data distribution
changes --- particularly if new classes are
introduced.

12
Performance Claims

Contentious
Combiners gt dim. Reduc. 9/10
MCS more accurate 17/13
Small engineered better 12/7
Small ensembles better 16/5
Not Contentious
Small ensembles better 16/5
Diverse MCS more accurate 17/3
MCS more stable 30/3
Fixed combiners gt trained 0/large
MCS good for incremental 24/0

13
Design and Data Principles

14
Design and Data Principles/1

It is possible to map the data complexity to
the success of a particular multiple classifier
systems. That is, we can deduce the type MCS
(boosting, bagging, random forests, boosting,
bites, etc.) to use based on a pre-analysis or
properties of data.

15
Design and Data Principles/2

Performance gain of ensembles can be quantified
as a function of the classifiers, diversity,
single classifiers base performance, etc.
Bonus question will we ever be able to quantify
this, even with knowledge of the data?

16
Design and Data Principles/3

The optimal level of diversity for an ensemble
can be directly determined from the input data,
independent of the classification model used.

17
Design and Data Principles/4

(Not just for MCS, but for pattern classification
algorithms in general the question of
generalization / scaling) Conjecture the error
probability of an algorithm goes up as the log
logarithm of the data size, with clever MCS

18
Design and Data Principles/5

Consider Wolperts stacked generalizer. It is an
empirical observation, seen many times, that the
crispness (that is, the fraction of samples
that are classified with high confidence in one
of the classes) of the level 1 classifier is
invariably greater than that of the individual
level 0 classifiers. The accuracy of the level 1
classifier may or may not be greater than the
accuracy of the best level 0 classifier.

19
Design and Data Principles/6

Good ECOC codes are not random
Random feature selection is good
Both are linked.

20
Design and Data Principles

Use complexity to pick MCS 13/5
Predict perf. from base prop. 12/13
Predict best diversity from data. 0/lots
Error is logarithm of data size 1/6
Crisp classifier not the accurate one 1/0
Good ECOC is not random 0/6

21
Predictions for the Field

22
Predictions for the Field/1

There is still substantial room for improvement
in Machine Learning for supervised learning
problems.

23
Predictions for the Field/2

The time will come that any pattern
classification system will be designed as a MCS,
as people will realize that MCS is the best
solution also for tasks where MCS cannot
outperform single classifiers.

24
Predictions for the Field/3