Title: Lecture 6 Classifiers and Pattern Recognition Systems
1Lecture 6Classifiers and Pattern Recognition
Systems
2Outlines
- Comparison of Classifiers
- Error Estimation for Classifiers
- ROC (Receiver Operator Characteristics )
- Bayes Framework for Image Classification an
application case study
3(No Transcript)
4(No Transcript)
5Error Estimation of Classifiers
- Ensure independent training and testing sets used
in training and testing. - Both data sets must be large enough depending on
the complexities of the problem, hundreds or
thousands samples are needed. - When samples are not enough using one of the
following methods - Re-substitution
- Holdout
- Leave-one-out
- N-fold cross validation
- Bootstrap
6(No Transcript)
7ROC (Receiver Operator Characteristics)
- The ROC curve plots the proportion of correct
responses (hits) against the false positives as
the decision boundary changes. - The ROC curve gives information which is
independent of the observers loss function. - The ROC presents a full picture than a single
error rate - A set of error rates under different confidence
levels - Subject to users needs to select proper
threshold values
8- For class ?2
- Hit rate correct classification of ?2
- Miss rate missing detection of ?2
- For class ?1
- False alarm rate false accepted as ?2
- Rejection rate correct rejection for ?1
9ROC and rejection rate
- Rejection is used to reduce recognition error
rates when samples are close to the decision
boundary. - Higher rejection rate will improve recognition
rate, shown in Fig. 3-13
10Classification Error vs. No. of Training
Samples/Class
11Face recognition performance comparison computer
vs. humanBy Andy Adler, et al., 2006
12Advances in Pattern Recognition
- Multimodality in features and classifiers
- Image, audio, text
- Different modality of medical images such as CT,
MRI, X-rays, Ultrasound, etc. - Combination of classifiers
- Open databases for performance benchmark of
pattern detection, classification for specific
application domains such as face, fingerprint,
handwritten characters, object recognition,
content based retrieval, etc.
13Neural networks
- Based on parallel massive simple processing units
- Perform nonlinear mapping from input features to
output category nodes - Black box hard to interpret after learning
- Good performance if training samples are large
enough - Matlab NN toolbox
14Application Case- Bayes Image Classification
- A Bayes Framework for Semantic Classification of
Outdoor Vacation Images, by Vailaya, et al, 2001
15(No Transcript)
16(No Transcript)
17Problem and approach
- Classify city vs. landscape (sunset, forest,
mountains) - Database 2716 images
- Classification accuracy 94 - 95
- Features used color histograms, color
coherences, edge direction histograms, and edge
direction coherence vectors - Representation VQ codebook
- Bayes framework
- VQ as conditional density estimator
- MAP criteria (Maximum A Posteriori)
18Bayes Framework
- Given image x belongs to one of K classes
- A priori knowledge
- 0/1 loss function
- Bayes law
19Feature extraction VQ representation
- Feature extraction color, edge related
- Assume feature independent
- Class conditional density estimation based on VQ
(vector quantization)
20Nearest Neighbor and Voronoi Tessellation
- Parzen window approximate density with
proportion of sample data in each cell - Different from the standard Parzen windows? Why?
- Codebook size q increase
- May cause over-fitting
- Consider the total data length for describing the
data and model - Minimal Description Length (MDL)
21System Architecture
150x150 750x750, 24bits/pixel
Color Histo. (64) Edge direc. (72)
Code book 40
22Experimental Results
- Edge information important for detecting city
images - Color information important for discriminating
landscape subclasses
23Wrongly classified images
24Summary
- Choose right classifiers based on your needs in
terms of accuracy, speed, memory constrains, and
samples available. - Classifiers performance evaluation is critical
for design of pattern recognition systems. - ROC is a powerful tool for evaluation of
classifiers. - Design, implement, evaluation of pattern
recognition systems require many iterative steps
to achieve good performance and meet application
needs.
25Reading
- Chapter 2, Pattern Classification by Duda, Hart,
Stork, 2001, Section 2.8.3, 48-51 - Chapter 9, Pattern Classification by Duda, Hart,
Stork, 2001, Section 9.6, 482-485 - A.K. Jain, R.P.W. Duin and J. Mao, "Statistical
pattern recognition a review," IEEE Transactions
on Pattern Analysis and Machine Intelligence,
vol. 22, no. 1, 2000, Section 7, 24-27 - A. Vailaya, et al., A Bayesian Framework for
Semantic Classification of Outdoor Vacation
Images, SPIE Conference on Electronic Imaging,
1999, San Jose, California, USA.
26Backup slides
27Generative vs. Discriminative
- Generative Methods
- Determine models of how patterns are formed.
- Use these models to perform discrimination.
- Pattern Theory. Grenander. (1996 book)
- Discriminative Methods
- Dont model pattern formation.
- Instead extract features from patterns and
make decision using these features. - Both Generative and Discriminative methods
require training data to learn the
models/features/decision rules. - Machine Learning concentrates on learning
discrimination rules. - Key Issue do we have enough training data to
learn?
28- The Generative approach will attempt to estimate
the Gaussian distributions from data and then
derive the decision rule. - The Discriminant approach will seek to estimate
the decision rule directly by learning the
discriminant plane. - In practice, we will not know the form of the
distributions or the form of the discriminant.
29- Bayes Decision Theory gives a framework for
Generative and Discriminative approaches. - Current Wisdom
- (i) Discriminative methods are simpler,
computationally faster, and easier to apply. - (ii) Generative methods are needed for most
complex problems. - Hybrid methods are increasingly popular.
30SVM (Supporting Vector Machine )
- SVMs perform structural risk minimization to
achieve good generalization. - A function that (1) minimizes the empirical
risk, (2) has low VC dimension - The optimization criterion is the width of the
margin between the classes. - Primarily two-class classifiers but can be
extended to multiple classes. - Linear SVM vs. Nonlinear SVM
- Mapping samples to a higher dimensional space
where different classes can be separated with a
hyperplane (kernel trick) - The performance of SVMs depends on the choice of
the kernel and its parameters.
31Margin of Separation for SVM
32Supporting Vectors
- The empty area around the decision boundary
defined by the distance to the nearest training
patterns (i.e., support vectors). - These are the most difficult patterns to classify.