Title: Outline
1Outline
- Linear Discriminant Functions - continued
2Linear Discriminant Functions
- A linear discriminant function is a linear
combination of its components, written as - Here w is the weight vector and w0 is the bias or
threshold weight
3Linear Discriminant Functions cont.
- Two-category case
- Decide w1 if g(x) gt 0 and w2 if g(x) lt 0
- If g(x) 0, then x is on the decision boundary
and can be assigned to either class
4Linear Discriminant Functions cont.
5Augmented Vector
6Two-Category Linearly Separable Case
- Given the forms of linear discriminant functions,
we want to learn the weights using a set of n
labeled samples - Linearly separable
- If there is a weight vector that can classify all
samples correctly, the samples are said to be
linearly separable.
7Two-Category Linearly Separable Case cont.
- Weight space
- Solution region
8Two-Category Linearly Separable Case cont.
- Margin
9Perceptron Criterion
- Perceptron criterion function
- Here Y(a) is the set of samples misclassified by
a - Note that Jp(a) gt 0
- Jp(a) 0 if and only if no sample is
misclassified
10Taylor Series Expansion cont.
x
x
x
x
x
F
(
)
F
(
)
F
(
)
Ñ
(
)
x
x
1
T
¼
2
x
x
x
x
x
-
-
-
(
)
F
(
)
(
)
Ñ
2
x
x
Gradient
Hessian
11Iterative Optimization
or
pk - Search Direction
ak - Learning Rate
12Gradient Descent
Choose the next step so that the function
decreases
For small changes in x we can approximate F(x)
where
If we want the function to decrease
We can maximize the decrease by choosing
13Example
14Example cont.
15Learning Rates Cannot Be Too Large
16Gradient Descent
17Gradient Descent cont.
- How to choose the learning rate h(k)
- Newtons algorithm
18Example
19Newtons Solution
20Two-Category Linearly Separable Case cont.
21Comparison of Gradient Descent and Newtons Method
22Two-Category Linearly Separable Case cont.
- Perceptron criterion function
- Here Y(a) is the set of samples misclassified by
a. - The gradient of Jp(a) is
- The update rule is
23Two-Category Linearly Separable Case cont.
24Two-Category Linearly Separable Case cont.
25Two-Category Linearly Separable Case cont.
Perceptron Convergence Theorem If training
samples are linearly separable, then the sequence
of weight vectors by the above algorithm will
terminate at a solution vector in a finite number
of times.
26(No Transcript)
27Two-Category Linearly Separable Case cont.
- Some Direct Generalizations
- Variable increment and a margin
28Two-Category Linearly Separable Case cont.
29Two-Category Linearly Separable Case cont.
30Relaxation Procedures
- Note that different criterion functions exist
- One possible choice is
- Where Y is again the set of the training samples
that are misclassified by a - However, there are two problems with this
criterion - The function is too smooth and can converge to
a0 - Jq is dominated by training samples with large
magnitude
31Relaxation Procedures
- A modified version that avoids the above two
problems is - Here Y is the set of samples for which
- Its gradient is given by
32Two-Category Linearly Separable Case cont.
33Two-Category Linearly Separable Case cont.
34Two-Category Linearly Separable Case cont.
35Two-Category Linearly Separable Case cont.
36Linear Discriminant Functions cont.
- The multi-category case
- There are more than one way to devise
multi-category classifiers using linear
discriminant functions - One against the rest (c linear discriminant
functions) - One against another (c (c-1)/2 linear
discriminant functions)
37Multi-category Case
38Multi-category Case cont.
39Multi-category Case cont.
- To avoid the problem of ambiguous regions, we
define c linear discrimination functions and
assign x to wi if gi(x) gt gj(x) for all j ? i. - The resulting classifier is called a linear
machine
40Multi-category Case cont.
41Training a Linear Machine
- Suppose there are c classes and we have c
discriminant functions - If a training sample yk is from class i, then to
classify it correctly, we need
42Training a Linear Machine cont.
- That is equivalent of classifying all the
following sample sets correctly in a two class
case sense - This is known as Keslers construction
43Training a Linear Machine cont.
- If a training sample yk is from class i and is
misclassified then, there must be at least one j
?i such that - Based on Keslers construction, the fixed
increment Perceptron rule is
44Two-Category Linearly Separable Case cont.
- What happens if the problem is not linearly
separable? - What can we do?
45Minimum Squared Error Procedures
- Minimum squared error and pseudoinverse
- The problem is to find a weight vector a
satisfying Yab - If we have more equations than unknowns, a is
over-determined. - We want to choose the one that minimizes the
sum-of-squared-error criterion function
46Minimum Squared Error Procedures cont.
47Minimum Squared Error Procedures cont.