Title: Outline
- Linear Discriminant Functions - continued
2Linear Discriminant Functions
- A linear discriminant function is a linear
combination of its components, written as - Here w is the weight vector and w0 is the bias or
threshold weight
3Linear Discriminant Functions cont.
- Two-category case
- Decide w1 if g(x) gt 0 and w2 if g(x) lt 0
- If g(x) 0, then x is on the decision boundary
and can be assigned to either class
4Linear Discriminant Functions cont.
5Augmented Vector
6Two-Category Linearly Separable Case
- Given the forms of linear discriminant functions,
we want to learn the weights using a set of n
labeled samples - Linearly separable
- If there is a weight vector that can classify all
samples correctly, the samples are said to be
linearly separable.
7Two-Category Linearly Separable Case cont.
- Weight space
- Solution region
8Two-Category Linearly Separable Case cont.
- Margin
9Perceptron Criterion
- Perceptron criterion function
- Here Y(a) is the set of samples misclassified by
a - Note that Jp(a) gt 0
- Jp(a) 0 if and only if no sample is
10Taylor Series Expansion cont.
11Iterative Optimization
pk - Search Direction
ak - Learning Rate
12Gradient Descent
Choose the next step so that the function
For small changes in x we can approximate F(x)
If we want the function to decrease
We can maximize the decrease by choosing
14Example cont.
15Learning Rates Cannot Be Too Large
16Gradient Descent
17Gradient Descent cont.
- How to choose the learning rate h(k)
- Newtons algorithm
19Newtons Solution
20Two-Category Linearly Separable Case cont.
21Comparison of Gradient Descent and Newtons Method
22Two-Category Linearly Separable Case cont.
- Perceptron criterion function
- Here Y(a) is the set of samples misclassified by
a. - The gradient of Jp(a) is
- The update rule is
23Two-Category Linearly Separable Case cont.
24Two-Category Linearly Separable Case cont.
25Two-Category Linearly Separable Case cont.
Perceptron Convergence Theorem If training
samples are linearly separable, then the sequence
of weight vectors by the above algorithm will
terminate at a solution vector in a finite number
of times.
26(No Transcript)
27Two-Category Linearly Separable Case cont.
- Some Direct Generalizations
- Variable increment and a margin
28Two-Category Linearly Separable Case cont.
29Two-Category Linearly Separable Case cont.
30Relaxation Procedures
- Note that different criterion functions exist
- One possible choice is
- Where Y is again the set of the training samples
that are misclassified by a - However, there are two problems with this
criterion - The function is too smooth and can converge to
a0 - Jq is dominated by training samples with large
31Relaxation Procedures
- A modified version that avoids the above two
problems is - Here Y is the set of samples for which
- Its gradient is given by
32Two-Category Linearly Separable Case cont.
33Two-Category Linearly Separable Case cont.
34Two-Category Linearly Separable Case cont.
35Two-Category Linearly Separable Case cont.
36Linear Discriminant Functions cont.
- The multi-category case
- There are more than one way to devise
multi-category classifiers using linear
discriminant functions - One against the rest (c linear discriminant
functions) - One against another (c (c-1)/2 linear
discriminant functions)
37Multi-category Case
38Multi-category Case cont.
39Multi-category Case cont.
- To avoid the problem of ambiguous regions, we
define c linear discrimination functions and
assign x to wi if gi(x) gt gj(x) for all j ? i. - The resulting classifier is called a linear
40Multi-category Case cont.
41Training a Linear Machine
- Suppose there are c classes and we have c
discriminant functions - If a training sample yk is from class i, then to
classify it correctly, we need
42Training a Linear Machine cont.
- That is equivalent of classifying all the
following sample sets correctly in a two class
case sense - This is known as Keslers construction
43Training a Linear Machine cont.
- If a training sample yk is from class i and is
misclassified then, there must be at least one j
?i such that - Based on Keslers construction, the fixed
increment Perceptron rule is
44Two-Category Linearly Separable Case cont.
- What happens if the problem is not linearly
separable? - What can we do?
45Minimum Squared Error Procedures
- Minimum squared error and pseudoinverse
- The problem is to find a weight vector a
satisfying Yab - If we have more equations than unknowns, a is
over-determined. - We want to choose the one that minimizes the
sum-of-squared-error criterion function
46Minimum Squared Error Procedures cont.
47Minimum Squared Error Procedures cont.