Outline

1 / 47

About This Presentation

Title:

Outline

Description:

If g(x) =0, then x is on the decision boundary and can be assigned to either class ... pk - Search Direction. ak - Learning Rate. or. 11/19/09. CAP5638. 12 ... –

Number of Views:47

Avg rating:3.0/5.0

Slides: 48

Provided by: xiuwe

Category:

more less

Transcript and Presenter's Notes

Title: Outline

1
Outline

Linear Discriminant Functions - continued

2
Linear Discriminant Functions

A linear discriminant function is a linear
combination of its components, written as
Here w is the weight vector and w0 is the bias or
threshold weight

3
Linear Discriminant Functions cont.

Two-category case
Decide w1 if g(x) gt 0 and w2 if g(x) lt 0
If g(x) 0, then x is on the decision boundary
and can be assigned to either class

4
Linear Discriminant Functions cont.
5
Augmented Vector
6
Two-Category Linearly Separable Case

Given the forms of linear discriminant functions,
we want to learn the weights using a set of n
labeled samples
Linearly separable
If there is a weight vector that can classify all
samples correctly, the samples are said to be
linearly separable.

7
Two-Category Linearly Separable Case cont.

Weight space
Solution region

8
Two-Category Linearly Separable Case cont.
- Margin
9
Perceptron Criterion

Perceptron criterion function
Here Y(a) is the set of samples misclassified by
a
Note that Jp(a) gt 0
Jp(a) 0 if and only if no sample is
misclassified

10
Taylor Series Expansion cont.

x
x
x
x
x
F
(
)
F
(
)
F
(
)
Ñ

(
)

x
x

1
T

¼
2
x
x
x
x
x
-
-
-

(
)
F
(
)

(
)
Ñ

2
x
x

Gradient
Hessian
11
Iterative Optimization
or
pk - Search Direction
ak - Learning Rate
12
Gradient Descent
Choose the next step so that the function
decreases
For small changes in x we can approximate F(x)
where
If we want the function to decrease
We can maximize the decrease by choosing
13
Example
14
Example cont.
15
Learning Rates Cannot Be Too Large
16
Gradient Descent
17
Gradient Descent cont.

How to choose the learning rate h(k)
Newtons algorithm

18
Example
19
Newtons Solution
20
Two-Category Linearly Separable Case cont.
21
Comparison of Gradient Descent and Newtons Method
22
Two-Category Linearly Separable Case cont.

Perceptron criterion function
Here Y(a) is the set of samples misclassified by
a.
The gradient of Jp(a) is
The update rule is

23
Two-Category Linearly Separable Case cont.
24
Two-Category Linearly Separable Case cont.
25
Two-Category Linearly Separable Case cont.
Perceptron Convergence Theorem If training
samples are linearly separable, then the sequence
of weight vectors by the above algorithm will
terminate at a solution vector in a finite number
of times.
26
(No Transcript)
27
Two-Category Linearly Separable Case cont.

Some Direct Generalizations
Variable increment and a margin

28
Two-Category Linearly Separable Case cont.
29
Two-Category Linearly Separable Case cont.
30
Relaxation Procedures

Note that different criterion functions exist
One possible choice is
Where Y is again the set of the training samples
that are misclassified by a
However, there are two problems with this
criterion
The function is too smooth and can converge to
a0
Jq is dominated by training samples with large
magnitude

31
Relaxation Procedures

A modified version that avoids the above two
problems is
Here Y is the set of samples for which
Its gradient is given by

32
Two-Category Linearly Separable Case cont.
33
Two-Category Linearly Separable Case cont.
34
Two-Category Linearly Separable Case cont.
35
Two-Category Linearly Separable Case cont.
36
Linear Discriminant Functions cont.

The multi-category case
There are more than one way to devise
multi-category classifiers using linear
discriminant functions
One against the rest (c linear discriminant
functions)
One against another (c (c-1)/2 linear
discriminant functions)

37
Multi-category Case
38
Multi-category Case cont.
39
Multi-category Case cont.

To avoid the problem of ambiguous regions, we
define c linear discrimination functions and
assign x to wi if gi(x) gt gj(x) for all j ? i.
The resulting classifier is called a linear
machine

40
Multi-category Case cont.
41
Training a Linear Machine

Suppose there are c classes and we have c
discriminant functions
If a training sample yk is from class i, then to
classify it correctly, we need

42
Training a Linear Machine cont.

That is equivalent of classifying all the
following sample sets correctly in a two class
case sense
This is known as Keslers construction

43
Training a Linear Machine cont.

If a training sample yk is from class i and is
misclassified then, there must be at least one j
?i such that
Based on Keslers construction, the fixed
increment Perceptron rule is

44
Two-Category Linearly Separable Case cont.

What happens if the problem is not linearly
separable?
What can we do?

45
Minimum Squared Error Procedures

Minimum squared error and pseudoinverse
The problem is to find a weight vector a
satisfying Yab
If we have more equations than unknowns, a is
over-determined.
We want to choose the one that minimizes the
sum-of-squared-error criterion function

46
Minimum Squared Error Procedures cont.