Part II: Practical Implementations. - PowerPoint PPT Presentation

1 / 78

About This Presentation

Title:

Part II: Practical Implementations.

Description:

Part II: Practical Implementations. – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 79

Provided by: spra153

Category:

more less

Transcript and Presenter's Notes

Title: Part II: Practical Implementations.

1
Part II Practical Implementations.
2
Modeling the Classes
Stochastic Discrimination
3
Algorithm for Training a SD Classifier
Generate projectable weak model
Evaluate model w.r.t. training set, check
enrichment
Check uniformity w.r.t. existing collection
Add to discriminant
4
Dealing with Data GeometrySD in Practice
5
2D Example

Adapted from Kleinberg, PAMI, May 2000

An r1/2 random subset in the feature space
that covers ½ of all the points

Watch how many such subsets cover a particular
point, say, (2,17)

(2,17)
8
In
Out
In
Its in 1/2 models Y ½ 0.5
Its in 2/3 models Y 2/3 0.67
Its in 0/1 models Y 0/1 0.0
In
In
In
Its in 3/4 models Y ¾ 0.75
Its in 4/5 models Y 4/5 0.8
Its in 5/6 models Y 5/6 0.83
9
In
In
Out
Its in 6/8 models Y 6/8 0.75
Its in 7/9 models Y 7/9 0.77
Its in 5/7 models Y 5/7 0.72
In
Out
Out
Its in 8/10 models Y 8/10 0.8
Its in 8/11 models Y 8/11 0.73
Its in 8/12 models Y 8/12 0.67
10

Fraction of r1/2 random subsets covering point
(2,17) as more such subsets are generated

Fractions of r1/2 random subsets covering
several selected points as more such subsets are
generated

Distribution of model coverage for all points in
space, with 100 models

Distribution of model coverage for all points in
space, with 200 models

Distribution of model coverage for all points in
space, with 300 models

Distribution of model coverage for all points in
space, with 400 models

Distribution of model coverage for all points in
space, with 500 models

Distribution of model coverage for all points in
space, with 1000 models

Distribution of model coverage for all points in
space, with 2000 models

Distribution of model coverage for all points in
space, with 5000 models

Introducing enrichment
For any discrimination to happen, the models
must have some difference in coverage for
different classes.

Enforcing enrichment (adding in a bias) require
each subset to cover more points of one class
than another

Class distribution
A biased (enriched) weak model
22

Distribution of model coverage for points in each
class, with 100 enriched weak models

Distribution of model coverage for points in each
class, with 200 enriched weak models

Distribution of model coverage for points in each
class, with 300 enriched weak models

Distribution of model coverage for points in each
class, with 400 enriched weak models

Distribution of model coverage for points in each
class, with 500 enriched weak models

Distribution of model coverage for points in each
class, with 1000 enriched weak models

Distribution of model coverage for points in each
class, with 2000 enriched weak models

Distribution of model coverage for points in each
class, with 5000 enriched weak models

Error rate decreases as number of models
increases
Decision rule if Y lt 0.5 then class 2
else class 1

Sparse Training Data
Incomplete knowledge about class distributions

Training Set
Test Set
32

Distribution of model coverage for points in each
class, with 100 enriched weak models

Training Set
Test Set
33

Distribution of model coverage for points in each
class, with 200 enriched weak models

Training Set
Test Set
34

Distribution of model coverage for points in each
class, with 300 enriched weak models

Training Set
Test Set
35

Distribution of model coverage for points in each
class, with 400 enriched weak models

Training Set
Test Set
36

Distribution of model coverage for points in each
class, with 500 enriched weak models

Training Set
Test Set
37

Distribution of model coverage for points in each
class, with 1000 enriched weak models

Training Set
Test Set
38

Distribution of model coverage for points in each
class, with 2000 enriched weak models

Training Set
Test Set
39

Distribution of model coverage for points in each
class, with 5000 enriched weak models

No discrimination!
Training Set
Test Set
40

Models of this type, when enriched for training
set, are not necessarily enriched for test set

Training Set
Test Set
Random model with 50 coverage of space
41

Introducing projectability
Maintain local continuity of class
interpretations.
Neighboring points of the same class should
share similar model coverage.

Allow some local continuity in model membership,
so that interpretation of a training point can
generalize to its immediate neighborhood

Class distribution
A projectable model
43

Distribution of model coverage for points in each
class, with 100 enriched, projectable weak models

Training Set
Test Set
44

Distribution of model coverage for points in each
class, with 300 enriched, projectable weak models

Training Set
Test Set
45

Distribution of model coverage for points in each
class, with 400 enriched, projectable weak models

Training Set
Test Set
46

Distribution of model coverage for points in each
class, with 500 enriched, projectable weak models

Training Set
Test Set
47

Distribution of model coverage for points in each
class, with 1000 enriched, projectable weak
models

Training Set
Test Set
48

Distribution of model coverage for points in each
class, with 2000 enriched, projectable weak
models

Training Set
Test Set
49

Distribution of model coverage for points in each
class, with 5000 enriched, projectable weak
models

Training Set
Test Set
50

Promoting uniformity
All points in the same class should have equal
likelihood to be covered by a model of each
particular rating.
Retain models that cover the points whose
coverage by current collection is less

Distribution of model coverage for points in each
class, with 100 enriched, projectable, uniform
weak models

Training Set
Test Set
52

Distribution of model coverage for points in each
class, with 1000 enriched, projectable, uniform
weak models

Training Set
Test Set
53

Distribution of model coverage for points in each
class, with 5000 enriched, projectable, uniform
weak models

Training Set
Test Set
54

Distribution of model coverage for points in each
class, with 10000 enriched, projectable, uniform
weak models

Training Set
Test Set
55

Distribution of model coverage for points in each
class, with 50000 enriched, projectable, uniform
weak models

Training Set
Test Set
56
The 3 necessary conditions
Discriminating Power
Enrichment
Uniformity
Projectability
Complementary Information
Generalization Power
57
Extensions and Comparisons
58
Alternative Discriminants

Berlind 1994
Different discriminants for N-class problems
Additional condition on symmetry
Approximate uniformity
Hierarchy of indiscernibility

59
Estimates of Classification Accuracies

Chen 1997
Statistical estimate of classification accuracy
under weaker conditions
Approximate uniformity
Approximate indiscernibility

60
Multi-class Problems

For n classes, define n discriminants Yi, one
for each class i vs the others
Classify an unknown point to the class i for
which the computed Yi is the largest

61
Ho Kleinberg ICPR 1996
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
Open Problems

Algorithm for uniformity enforcement
Deterministic methods?
Desirable form of weak models
Fewer, more sophisticated classifiers?
Other ways to address the 3-way trade-off
Enrichment / Uniformity / Projectability

66
Random Decision Forest

Ho 1995, 1998
A structured way to create models fully split a
tree, use leaves as models
Perfect enrichment and uniformity for TR
Promote projectability by subspace projection

67
Compact Distribution Maps

Ho Baird 1993, 1997
Another structured way to create models
Start with projectable models by coarse
quantization of feature value range
Seek enrichment and uniformity

68
SD Other Ensemble Methods

Ensemble learning via boosting
A sequential way to promote uniformity of
ensemble element coverage
XCS (a genetic algorithm)
A way to create, filter, and use stochastic
models that are regions in feature space

69
XCS Classifier System

Wilson,95
Recent focus of GA community
Good performance
Reinforcement Learning Genetic Algorithms
Model set of rules

if (shapesquare and numbergt10) then classred if
(shapecircle and numberlt5) then classyellow
input
class
Set of Rules
update
search
Reinforcement Learning
Genetic Algorithms
reward
Environment
70
Multiple Classifier SystemsExamples in Word
Image Recognition
71
Complementary Strengths of Classifiers
Rank of true class out of a lexicon of 1091
words, by 10 classifiers for 20 images

The case for classifier combination
decision fusion
mixture of experts
committee decision making

72
Classifier Combination Methods

Decision Optimization
find consensus among a given set of classifiers
Coverage Optimization
create a set of classifiers that work best with
a given decision combination function

73
Decision Optimization

Develop classifiers with expert knowledge
Try to make the best use of their decisions
via majority/plurality vote, sum/product rule,
probabilistic methods, Bayesian methods,
rank/confidence score combination
The joint capability of the classifiers set an
intrinsic limit on the combined accuracy
There is no way to handle the blind spots

74
Difficulties in Decision Optimization

Reliability versus overall accuracy
Fixed or trainable combination function
Simple models or combinatorial estimates
How to model complementary behavior

75
Coverage Optimization

Fix a decision combination function
Generate classifiers automatically and
systematically
via training set sub-sampling (stacking,
bagging, boosting),
subspace projection (RSM),
superclass/subclass decomposition (ECOC),
random perturbation of training processes, noise
injection
Need enough classifiers to cover all blind spots
(how many are enough?)
What else is critical?

76
Difficulties inCoverage Optimization

What kind of differences to introduce
Subsamples? Subspaces? Super/Subclasses?
Training parameters?
Model geometry?
3-way tradeoff
discrimination diversity generalization
Effects of the form of component classifiers

77
Dilemmas and Paradoxes in Classifier Combination

Weaken individuals for a stronger whole?
Sacrifice known samples for unseen cases?
Seek agreements or differences?

78
Stochastic Discrimination

A mathematical theory that relates several key
concepts in pattern recognition
Discriminative power enrichment
Complementary information uniformity
Generalization power projectability
It offers a way to describe complementary
behavior of classifiers
It offers guidelines to design multiple
classifier systems (classifier ensembles)

Write a Comment

User Comments (0)