Advanced Pattern Recognition Lecture 1 - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Advanced Pattern Recognition Lecture 1

Description:

... for Pattern Analysis' Cambridge University ... Handwriting recognition on PDA. Structural PR: recognition of hanzi. ... Pattern Analysis Algorithm should be ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 28

Provided by: TKT7

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Pattern Recognition Lecture 1

1
Advanced Pattern RecognitionLecture 1

Spring 2007

1 J.Shawe-Taylor, N.Christianini,
Kernel methods for Pattern Analysis
Cambridge University Press, 2004.
2 B. Scholkopf, A.Smola,
Learning with kernels, MIT Press, 2002.
3 www.kernel-methods.net
4 Journal papers, tutorials.

Pattern Recognition is a field of Computational
Intelligence, where predefined form of an input
signal is searched or similarities between the
signal forms are studied.
The input signal can be from an electric
measurement or e.g. text documents.
Computational Intelligence includes methods
groups like Artificial Intelligence, Pattern
Recognition, Fuzzy Logic, Genetic Algorithms,
Neural Networks, etc.
Application fields include Image Analysis,
Speech Recognition, medical signal and DNA
sequence analysis.

4
Watanabe
5

Traditionally Pattern Recognition (PR) is
divided into statistical pattern recognition and
structural pattern recognition.
In statistical PR signal statistics are needed
for recognition.
In structural (syntactical) PR a pattern is
described by a grammar for structural elements.

6
PR applications
7
PR applications (contd)
8

Examples
Handwriting recognition on PDA.
Structural PR recognition of hanzi.
Cluster analysis (area, length, etc.).
How many clusters?
Forest photo segmentation.

Example Face detection

10
Example Segmentation for multiple sclerosis
11
(No Transcript)
12

Salmon or Sea bass?

Overfitting problem.

Non-linear decision boundary

Planetary data

3
T is period of convolution (years), and R is
radius of orbit The quantity R3/T2 remains the
same.
16

Example Planetary data (1)

log(T)
or
log(R)
17
Example Planetary data (2)
y2/b2
y
x
x2/a2
The artificial planetary data lying on an ellipse
in two dimensions and the same data represented
using features x2 and y2 showing a linear
relation
18

We will define pattern recognition as a function
(pattern function) for which
f(x) 0 (1)
For example, for the planetary data
f(R,T) R3?T20
Zero in ideal case, in practical cases f(x) not
equal to zero
f(D,P) R3?T2?0

Lets assume that we have a function g(x), which
predicts output (i.e. class where a pattern
belongs to) for input x.
Lets also assume that we have a training set
where we know correct output g (i.e.
classification).
The training set is formed by pairs (x,y). The
function f can be defined as
f(x,y) ?L(g(x), y)0 (2)
where g is prediction function, and LX?Y?R is a
loss function for which the volume is 0, when
predicted class (g(x)) and the correct class (y)
are equal.

In practical cases relation f(x,y)0 is not exact
but we have to accept approximation f(x,y) ? 0.
Note 1. If (2) is exactly valid for a training
set, here is a risk of overfitting. It means that
we tune the recognition for that set at does not
work well in general.
In statistical sense, this means that Ef(x) ? 0,
where the E is the expectance value.

Definition A. Pattern Analysis Algorithm
takes as input a finite set of examples from the
source of data to be analyzed. Its output is
either an indication that no patterns detectable
in the data, or a positive pattern function f
that the algorithm assert satisfies
Ef(x) 0 (5)

Pattern Analysis Algorithm should be
Computationally efficient, it is possible to use
large data sets
Robust, must be able to handle noisy data and
identify approximate patterns
Statistically stable, the output of the
algorithm should not be depending on the data
sets used.
.

It would be desirable to use linear functions.
In practice, the data is often not separable by
linear functions.
In Kernel Methods we use linear feature space
by kernels without doing actual transform to the
feature space.

24
(No Transcript)
25