Advanced Pattern Recognition Lecture 1 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Advanced Pattern Recognition Lecture 1

Description:

... for Pattern Analysis' Cambridge University ... Handwriting recognition on PDA. Structural PR: recognition of hanzi. ... Pattern Analysis Algorithm should be ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 28
Provided by: TKT7
Category:

less

Transcript and Presenter's Notes

Title: Advanced Pattern Recognition Lecture 1


1
Advanced Pattern RecognitionLecture 1
  • Spring 2007

2
  • 1 J.Shawe-Taylor, N.Christianini,
  • Kernel methods for Pattern Analysis
  • Cambridge University Press, 2004.
  • 2 B. Scholkopf, A.Smola,
  • Learning with kernels, MIT Press, 2002.
  • 3 www.kernel-methods.net
  • 4 Journal papers, tutorials.

3
  • Pattern Recognition is a field of Computational
    Intelligence, where predefined form of an input
    signal is searched or similarities between the
    signal forms are studied.
  • The input signal can be from an electric
    measurement or e.g. text documents.
  • Computational Intelligence includes methods
    groups like Artificial Intelligence, Pattern
    Recognition, Fuzzy Logic, Genetic Algorithms,
    Neural Networks, etc.
  • Application fields include Image Analysis,
    Speech Recognition, medical signal and DNA
    sequence analysis.

4
Watanabe
5
  • Traditionally Pattern Recognition (PR) is
    divided into statistical pattern recognition and
    structural pattern recognition.
  • In statistical PR signal statistics are needed
    for recognition.
  • In structural (syntactical) PR a pattern is
    described by a grammar for structural elements.

6
PR applications
7
PR applications (contd)
8
  • Examples
  • Handwriting recognition on PDA.
  • Structural PR recognition of hanzi.
  • Cluster analysis (area, length, etc.).
  • How many clusters?
  • Forest photo segmentation.

9
  • Example Face detection

10
Example Segmentation for multiple sclerosis
11
(No Transcript)
12
  • Salmon or Sea bass?

13
  • Overfitting problem.

14
  • Non-linear decision boundary

15
  • Planetary data

3
T is period of convolution (years), and R is
radius of orbit The quantity R3/T2 remains the
same.
16
  • Example Planetary data (1)

log(T)
or
log(R)
17
Example Planetary data (2)
y2/b2
y
x
x2/a2
The artificial planetary data lying on an ellipse
in two dimensions and the same data represented
using features x2 and y2 showing a linear
relation
18
  • We will define pattern recognition as a function
    (pattern function) for which
  • f(x) 0 (1)
  • For example, for the planetary data
  • f(R,T) R3?T20
  • Zero in ideal case, in practical cases f(x) not
    equal to zero
  • f(D,P) R3?T2?0

19
  • Lets assume that we have a function g(x), which
    predicts output (i.e. class where a pattern
    belongs to) for input x.
  • Lets also assume that we have a training set
    where we know correct output g (i.e.
    classification).
  • The training set is formed by pairs (x,y). The
    function f can be defined as
  • f(x,y) ?L(g(x), y)0 (2)
  • where g is prediction function, and LX?Y?R is a
    loss function for which the volume is 0, when
    predicted class (g(x)) and the correct class (y)
    are equal.

20
  • In practical cases relation f(x,y)0 is not exact
    but we have to accept approximation f(x,y) ? 0.
  • Note 1. If (2) is exactly valid for a training
    set, here is a risk of overfitting. It means that
    we tune the recognition for that set at does not
    work well in general.
  • In statistical sense, this means that Ef(x) ? 0,
  • where the E is the expectance value.

21
  • Definition A. Pattern Analysis Algorithm
  • takes as input a finite set of examples from the
    source of data to be analyzed. Its output is
    either an indication that no patterns detectable
    in the data, or a positive pattern function f
    that the algorithm assert satisfies
  • Ef(x) 0 (5)

22
  • Pattern Analysis Algorithm should be
  • Computationally efficient, it is possible to use
    large data sets
  • Robust, must be able to handle noisy data and
    identify approximate patterns
  • Statistically stable, the output of the
    algorithm should not be depending on the data
    sets used.
  • .

23
  • It would be desirable to use linear functions.
    In practice, the data is often not separable by
    linear functions.
  • In Kernel Methods we use linear feature space
    by kernels without doing actual transform to the
    feature space.

24
(No Transcript)
25
  • Example

26
Inner product is
27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com