Lecture 6 Classifiers and Pattern Recognition Systems - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Lecture 6 Classifiers and Pattern Recognition Systems

Description:

Chapter 2, Pattern Classification by Duda, Hart, Stork, 2001, Section 2.8.3, 48-51. Chapter 9, Pattern Classification by Duda, Hart, Stork, 2001, Section 9.6, 482-485 ... – PowerPoint PPT presentation

Number of Views:667
Avg rating:3.0/5.0
Slides: 33
Provided by: djam79
Category:

less

Transcript and Presenter's Notes

Title: Lecture 6 Classifiers and Pattern Recognition Systems


1
Lecture 6Classifiers and Pattern Recognition
Systems

2
Outlines
  • Comparison of Classifiers
  • Error Estimation for Classifiers
  • ROC (Receiver Operator Characteristics )
  • Bayes Framework for Image Classification an
    application case study

3
(No Transcript)
4
(No Transcript)
5
Error Estimation of Classifiers
  • Ensure independent training and testing sets used
    in training and testing.
  • Both data sets must be large enough depending on
    the complexities of the problem, hundreds or
    thousands samples are needed.
  • When samples are not enough using one of the
    following methods
  • Re-substitution
  • Holdout
  • Leave-one-out
  • N-fold cross validation
  • Bootstrap

6
(No Transcript)
7
ROC (Receiver Operator Characteristics)
  • The ROC curve plots the proportion of correct
    responses (hits) against the false positives as
    the decision boundary changes.
  • The ROC curve gives information which is
    independent of the observers loss function.
  • The ROC presents a full picture than a single
    error rate
  • A set of error rates under different confidence
    levels
  • Subject to users needs to select proper
    threshold values

8
  • For class ?2
  • Hit rate correct classification of ?2
  • Miss rate missing detection of ?2
  • For class ?1
  • False alarm rate false accepted as ?2
  • Rejection rate correct rejection for ?1

9
ROC and rejection rate
  • Rejection is used to reduce recognition error
    rates when samples are close to the decision
    boundary.
  • Higher rejection rate will improve recognition
    rate, shown in Fig. 3-13

10
Classification Error vs. No. of Training
Samples/Class
11
Face recognition performance comparison computer
vs. humanBy Andy Adler, et al., 2006
12
Advances in Pattern Recognition
  • Multimodality in features and classifiers
  • Image, audio, text
  • Different modality of medical images such as CT,
    MRI, X-rays, Ultrasound, etc.
  • Combination of classifiers
  • Open databases for performance benchmark of
    pattern detection, classification for specific
    application domains such as face, fingerprint,
    handwritten characters, object recognition,
    content based retrieval, etc.

13
Neural networks
  • Based on parallel massive simple processing units
  • Perform nonlinear mapping from input features to
    output category nodes
  • Black box hard to interpret after learning
  • Good performance if training samples are large
    enough
  • Matlab NN toolbox

14
Application Case- Bayes Image Classification
  • A Bayes Framework for Semantic Classification of
    Outdoor Vacation Images, by Vailaya, et al, 2001

15
(No Transcript)
16
(No Transcript)
17
Problem and approach
  • Classify city vs. landscape (sunset, forest,
    mountains)
  • Database 2716 images
  • Classification accuracy 94 - 95
  • Features used color histograms, color
    coherences, edge direction histograms, and edge
    direction coherence vectors
  • Representation VQ codebook
  • Bayes framework
  • VQ as conditional density estimator
  • MAP criteria (Maximum A Posteriori)

18
Bayes Framework
  • Given image x belongs to one of K classes
  • A priori knowledge
  • 0/1 loss function
  • Bayes law

19
Feature extraction VQ representation
  • Feature extraction color, edge related
  • Assume feature independent
  • Class conditional density estimation based on VQ
    (vector quantization)

20
Nearest Neighbor and Voronoi Tessellation
  • Parzen window approximate density with
    proportion of sample data in each cell
  • Different from the standard Parzen windows? Why?
  • Codebook size q increase
  • May cause over-fitting
  • Consider the total data length for describing the
    data and model
  • Minimal Description Length (MDL)

21
System Architecture
150x150 750x750, 24bits/pixel
Color Histo. (64) Edge direc. (72)
Code book 40
22
Experimental Results
  • Edge information important for detecting city
    images
  • Color information important for discriminating
    landscape subclasses

23
Wrongly classified images
24
Summary
  • Choose right classifiers based on your needs in
    terms of accuracy, speed, memory constrains, and
    samples available.
  • Classifiers performance evaluation is critical
    for design of pattern recognition systems.
  • ROC is a powerful tool for evaluation of
    classifiers.
  • Design, implement, evaluation of pattern
    recognition systems require many iterative steps
    to achieve good performance and meet application
    needs.

25
Reading
  • Chapter 2, Pattern Classification by Duda, Hart,
    Stork, 2001, Section 2.8.3, 48-51
  • Chapter 9, Pattern Classification by Duda, Hart,
    Stork, 2001, Section 9.6, 482-485
  • A.K. Jain, R.P.W. Duin and J. Mao, "Statistical
    pattern recognition a review," IEEE Transactions
    on Pattern Analysis and Machine Intelligence,
    vol. 22, no. 1, 2000, Section 7, 24-27
  • A. Vailaya, et al., A Bayesian Framework for
    Semantic Classification of Outdoor Vacation
    Images, SPIE Conference on Electronic Imaging,
    1999, San Jose, California, USA.

26
Backup slides
27
Generative vs. Discriminative
  • Generative Methods
  • Determine models of how patterns are formed.
  • Use these models to perform discrimination.
  • Pattern Theory. Grenander. (1996 book)
  • Discriminative Methods
  • Dont model pattern formation.
  • Instead extract features from patterns and
    make decision using these features.
  • Both Generative and Discriminative methods
    require training data to learn the
    models/features/decision rules.
  • Machine Learning concentrates on learning
    discrimination rules.
  • Key Issue do we have enough training data to
    learn?

28
  • The Generative approach will attempt to estimate
    the Gaussian distributions from data and then
    derive the decision rule.
  • The Discriminant approach will seek to estimate
    the decision rule directly by learning the
    discriminant plane.
  • In practice, we will not know the form of the
    distributions or the form of the discriminant.

29
  • Bayes Decision Theory gives a framework for
    Generative and Discriminative approaches.
  • Current Wisdom
  • (i) Discriminative methods are simpler,
    computationally faster, and easier to apply.
  • (ii) Generative methods are needed for most
    complex problems.
  • Hybrid methods are increasingly popular.

30
SVM (Supporting Vector Machine )
  • SVMs perform structural risk minimization to
    achieve good generalization.
  • A function that (1) minimizes the empirical
    risk, (2) has low VC dimension
  • The optimization criterion is the width of the
    margin between the classes.
  • Primarily two-class classifiers but can be
    extended to multiple classes.
  • Linear SVM vs. Nonlinear SVM
  • Mapping samples to a higher dimensional space
    where different classes can be separated with a
    hyperplane (kernel trick)
  • The performance of SVMs depends on the choice of
    the kernel and its parameters.

31
Margin of Separation for SVM
32
Supporting Vectors
  • The empty area around the decision boundary
    defined by the distance to the nearest training
    patterns (i.e., support vectors).
  • These are the most difficult patterns to classify.
Write a Comment
User Comments (0)
About PowerShow.com