Outline - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Outline

Description:

If g(x) =0, then x is on the decision boundary and can be assigned to either class ... pk - Search Direction. ak - Learning Rate. or. 11/19/09. CAP5638. 12 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 48
Provided by: xiuwe
Category:
Tags: class | outline

less

Transcript and Presenter's Notes

Title: Outline


1
Outline
  • Linear Discriminant Functions - continued

2
Linear Discriminant Functions
  • A linear discriminant function is a linear
    combination of its components, written as
  • Here w is the weight vector and w0 is the bias or
    threshold weight

3
Linear Discriminant Functions cont.
  • Two-category case
  • Decide w1 if g(x) gt 0 and w2 if g(x) lt 0
  • If g(x) 0, then x is on the decision boundary
    and can be assigned to either class

4
Linear Discriminant Functions cont.
5
Augmented Vector
6
Two-Category Linearly Separable Case
  • Given the forms of linear discriminant functions,
    we want to learn the weights using a set of n
    labeled samples
  • Linearly separable
  • If there is a weight vector that can classify all
    samples correctly, the samples are said to be
    linearly separable.

7
Two-Category Linearly Separable Case cont.
  • Weight space
  • Solution region

8
Two-Category Linearly Separable Case cont.
- Margin
9
Perceptron Criterion
  • Perceptron criterion function
  • Here Y(a) is the set of samples misclassified by
    a
  • Note that Jp(a) gt 0
  • Jp(a) 0 if and only if no sample is
    misclassified

10
Taylor Series Expansion cont.


x
x
x
x
x
F
(
)
F
(
)
F
(
)
Ñ

(
)



x
x

1
T


¼
2
x
x
x
x
x
-
-
-

(
)
F
(
)

(
)
Ñ



2
x
x

Gradient
Hessian
11
Iterative Optimization
or
pk - Search Direction
ak - Learning Rate
12
Gradient Descent
Choose the next step so that the function
decreases
For small changes in x we can approximate F(x)
where
If we want the function to decrease
We can maximize the decrease by choosing
13
Example
14
Example cont.
15
Learning Rates Cannot Be Too Large
16
Gradient Descent
17
Gradient Descent cont.
  • How to choose the learning rate h(k)
  • Newtons algorithm

18
Example
19
Newtons Solution
20
Two-Category Linearly Separable Case cont.
21
Comparison of Gradient Descent and Newtons Method
22
Two-Category Linearly Separable Case cont.
  • Perceptron criterion function
  • Here Y(a) is the set of samples misclassified by
    a.
  • The gradient of Jp(a) is
  • The update rule is

23
Two-Category Linearly Separable Case cont.
24
Two-Category Linearly Separable Case cont.
25
Two-Category Linearly Separable Case cont.
Perceptron Convergence Theorem If training
samples are linearly separable, then the sequence
of weight vectors by the above algorithm will
terminate at a solution vector in a finite number
of times.
26
(No Transcript)
27
Two-Category Linearly Separable Case cont.
  • Some Direct Generalizations
  • Variable increment and a margin

28
Two-Category Linearly Separable Case cont.
29
Two-Category Linearly Separable Case cont.
30
Relaxation Procedures
  • Note that different criterion functions exist
  • One possible choice is
  • Where Y is again the set of the training samples
    that are misclassified by a
  • However, there are two problems with this
    criterion
  • The function is too smooth and can converge to
    a0
  • Jq is dominated by training samples with large
    magnitude

31
Relaxation Procedures
  • A modified version that avoids the above two
    problems is
  • Here Y is the set of samples for which
  • Its gradient is given by

32
Two-Category Linearly Separable Case cont.
33
Two-Category Linearly Separable Case cont.
34
Two-Category Linearly Separable Case cont.
35
Two-Category Linearly Separable Case cont.
36
Linear Discriminant Functions cont.
  • The multi-category case
  • There are more than one way to devise
    multi-category classifiers using linear
    discriminant functions
  • One against the rest (c linear discriminant
    functions)
  • One against another (c (c-1)/2 linear
    discriminant functions)

37
Multi-category Case
38
Multi-category Case cont.
39
Multi-category Case cont.
  • To avoid the problem of ambiguous regions, we
    define c linear discrimination functions and
    assign x to wi if gi(x) gt gj(x) for all j ? i.
  • The resulting classifier is called a linear
    machine

40
Multi-category Case cont.
41
Training a Linear Machine
  • Suppose there are c classes and we have c
    discriminant functions
  • If a training sample yk is from class i, then to
    classify it correctly, we need

42
Training a Linear Machine cont.
  • That is equivalent of classifying all the
    following sample sets correctly in a two class
    case sense
  • This is known as Keslers construction

43
Training a Linear Machine cont.
  • If a training sample yk is from class i and is
    misclassified then, there must be at least one j
    ?i such that
  • Based on Keslers construction, the fixed
    increment Perceptron rule is

44
Two-Category Linearly Separable Case cont.
  • What happens if the problem is not linearly
    separable?
  • What can we do?

45
Minimum Squared Error Procedures
  • Minimum squared error and pseudoinverse
  • The problem is to find a weight vector a
    satisfying Yab
  • If we have more equations than unknowns, a is
    over-determined.
  • We want to choose the one that minimizes the
    sum-of-squared-error criterion function

46
Minimum Squared Error Procedures cont.
  • Pseudoinverse

47
Minimum Squared Error Procedures cont.
Write a Comment
User Comments (0)
About PowerShow.com