Title: 4' Closed book
1???? ??
- 1. ?? 1- 5 ?
- 2. ?? 4/26, 4/29-31 ? ??
- 3. ???? ?? 2??
- 4. Closed book
2Ch. 5. Linear Discriminant Functions
3Linear Discriminant Functions
- Decision functions (discriminant functions) are
linear functions of x - Easy to compute attractive candidates for
initial, trial classifiers - Finding a linear discriminant functions training
- minimizing a criterion function
4Linear Discriminant Functions
- Discriminant function linear combination of the
components of x - where, w is the weight vector, and w0 is the
bias or threshold weight - A two category linear classifier have the
decision rule
5Linear Classifier
- Simple linear classifier having d input units,
each of which corresponds to the values of the
components of an input vector - The output unit sums all these products, and
emits 1 if w0 wtxgt 0, or -1 otherwise
6Decision Surface
- The decision surface separates points assigned to
?1 from points assigned to ?2 - When g(x) is linear, this decision surface is
hyperplane H - If x1 and x2 are both on the decision surface,
then
w is normal to a vector x lying in the
hyperplane
7Decision Surface Hyperplane
8Decision Surface Hyperplane
9Decision Surface Hyperplane
10Multicategory Linear Discriminant Function
- ?i/not ?i dichotomies
- (c two-class problems)
- ?i/?j dichotomies
- c(c-1)/2 two-class problems
Ambiguous region
11Multi-Category Linear Discriminant Function
Linear Machine
- The c linear discriminant functions are given by
- The decision function for multi-category, linear
machine - Assigns x to ?i if gi (x)gtgj(x) for all ji
- Divides the feature space into c decision regions
- The boundary between Ri and Rj is a portion of
the hyperplane Hij, defined by - The weight vector w wi - wj is normal to Hij
- The signed distance from x to Hij is given by
- (gi(x)-gj(x))/ wi- wj
12Multicategory Linear Discriminant Function
Ambiguous region
13Multicategory Linear Discriminant Function Case I
14Multicategory Linear Discriminant Function Case
II
15Multicategory Linear Discriminant Function Case
III
16Multilayered Machine
- Piecewise linear classifiers are capable of
realizing complex decision surfaces - without using the expensive polynomial
transform network required by f-machine
17Generalization of Linear Discriminant Functions
Quadratic Discriminant Functions
- The quadratic discriminant function is obtained
by adding additional terms, wijxixj, involving
the products of pairs of component of x - xixjxjxi ? wijwji, with no loss in
generality - The quadratic discriminant function has an
additional d(d1)/2 coefficients ? more
complicated separating surface - The separating surface defined by g(x)0 is a
second-degree or hyperquadratic surface
18Generalized Linear Discriminant Functions
- The generalized linear discriminant function is
given as - a is a d-dimensional vector
- the d-function yi(x) is arbitrary functions of x
(f-function) - yi(x) merely map points in the d-dimensional
x-space to points in d-dimensional y-space - The homogeneous discriminant function aty
separates points in this transformed space by a
hyperplane passing through the origin - The mapping from x to y reduces the problem to
one of finding a homogeneous linear discriminant
function
19Example of Nonlinear Mapping
- The mapping y 1, x, x2t takes a line and
transforms it to a parabola in three dimensions - A plane splits the resulting y-space into
regions corresponding to two categories, and this
in turn gives a non-simply connected decision
region in the one-dimensional x-space
20Example of Nonlinear Mapping
21Augmented Vector writing g(x) as aty
22Two-Category Linearly Separable Case
- In two-category case, suppose that...
- y1,,yn a set of n samples
- ?1,?2 categories
- g(x)aty a linear discriminant function
- If there exists a weight vector a which
classifies all of the samples correctly, the
samples are said to be linearly separable - A sample yi is classified correctly if atyi gt0
and yi is labelled ?1, or if atyi lt0 and yi is
labelled ?2 - Such a weight vector a is called a separating
vector or more generally a solution vector
23Geometry and Terminology
- The weight vector a can specify a point in weight
space - Each sample yi places a constraint on the
possible location of a solution vector - The equation atyi0 defines a hyperplane through
the origin of weight space having yi as a normal
vector - The solution vector must be on the positive side
of every hyperplane - A solution vector must lie in the intersection of
n half-spaces indeed any vector in this region
is a solution vector - The corresponding region is called the solution
region
24Solution Region in Feature Space
- The solution vectors leads to a plane that
separates the patterns from the two categories
25Effect of Margin on Solution Region
- Solution vector is not unique -gt
- 1. Find a unit-length weight vector that
maximizes the min. distancesamples,hypeplane - 2. Find a unit-length weight vector satisfying
atyi gt b for all i, where b (a positive constant)
is called margin - The solution region resulting from the
intersections of the half-spaces for which atyi gt
b gt 0 lies within the previous solution region by
the distance b/yi
26Gradient Descent Procedure
- Finding solution to the set of linear
inequalities atyigt0 define a criterion function
J(a) that is minimized when a is solution vector - Can be solved by gradient decent procedure
- 1. choose arbitrarily weight vector a(1) and
compute the gradient vector ?J(a(1)) - 2. obtain the next value a(2) by moving some
distance from a(1) in the direction of steepest
descent, i.e., along the negative of the gradient - In general, a(k1) is obtained from a(k) by the
equation
27Basic Gradient Descent Procedure
28Gradient Descent Newtons Algorithm
See p. 608
29Gradient Descent Newtons Algorithm
30Gradient Descent Algorithms
- Sequence of weight vectors
- Newtons method greater improvement in
convergence per step, even when using optimal
learning rates for both methods - However, the added computational burden of
inverting the Hessian matrix used is not always
justified, and simple gradient descent may suffice
31Perceptron Criterion Function
- The perceptron criterion function
32Perceptron Learning Algorithm
- The next weight vector is obtained by adding
scalar multiple of sum vector of the
misclassified samples
33Perceptron Criterion Function
34Squared Error Criterion Function
35Squared Error Criterion Function
36Minimum Squared Error Procedure
???? misclassified sample??? ???? criterion?
?????, ???? all sample?? ??
37Minimum Squared Error Procedure
- A simple closed-form solution can be found by
forming the gradient - Setting grad(Js(a))0,
- where the d-by-n matrix Y is called
pseudo-inverse of Y - Note that YYI, but YY!I in general
- If Y is defined more generally by Y
lim(YtYeI)1Yt - It can be shown that this limit always exists
and, aYb is an MSE solution to Yab
38Example by Matrix Pseudo-inverse
39Widrow-Hoff Rule / LMS Procedure
40LMS Procedure
41Example for LMS Algorithm
- Find the decision surface using LMSE algorithm
- The augmented patterns are
- ?1 (0,0,0,0)',(1,0,0,1)',(1,0,1,1,)',(1,1,0,1)'
- ?2 (0,0,1,1)',(0,1,0,1)',(0,1,1,1,)',(1,1,1,1)'
- Letting a(1)0, ?(k) 1/k
- Solution a (0.135,-0.238,-0.305,0.721)'