4' Closed book - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

4' Closed book

Description:

Decision functions (discriminant functions) are linear functions of x ... computational burden of inverting the Hessian matrix used is not always ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 42

Provided by: bmeYon

Category:

more less

Transcript and Presenter's Notes

Title: 4' Closed book

1
???? ??

1. ?? 1- 5 ?
2. ?? 4/26, 4/29-31 ? ??
3. ???? ?? 2??
4. Closed book

2
Ch. 5. Linear Discriminant Functions
3
Linear Discriminant Functions

Decision functions (discriminant functions) are
linear functions of x
Easy to compute attractive candidates for
initial, trial classifiers
Finding a linear discriminant functions training
- minimizing a criterion function

4
Linear Discriminant Functions

Discriminant function linear combination of the
components of x
where, w is the weight vector, and w0 is the
bias or threshold weight
A two category linear classifier have the
decision rule

5
Linear Classifier

Simple linear classifier having d input units,
each of which corresponds to the values of the
components of an input vector
The output unit sums all these products, and
emits 1 if w0 wtxgt 0, or -1 otherwise

6
Decision Surface

The decision surface separates points assigned to
?1 from points assigned to ?2
When g(x) is linear, this decision surface is
hyperplane H
If x1 and x2 are both on the decision surface,
then

w is normal to a vector x lying in the
hyperplane
7
Decision Surface Hyperplane
8
Decision Surface Hyperplane
9
Decision Surface Hyperplane
10
Multicategory Linear Discriminant Function

Four-class problem

?i/not ?i dichotomies
(c two-class problems)

?i/?j dichotomies
c(c-1)/2 two-class problems

Ambiguous region
11
Multi-Category Linear Discriminant Function
Linear Machine

The c linear discriminant functions are given by
The decision function for multi-category, linear
machine
Assigns x to ?i if gi (x)gtgj(x) for all ji
Divides the feature space into c decision regions
The boundary between Ri and Rj is a portion of
the hyperplane Hij, defined by
The weight vector w wi - wj is normal to Hij
The signed distance from x to Hij is given by
(gi(x)-gj(x))/ wi- wj

12
Multicategory Linear Discriminant Function

Three-class problem

Four-class problem

Ambiguous region
13
Multicategory Linear Discriminant Function Case I
14
Multicategory Linear Discriminant Function Case
II
15
Multicategory Linear Discriminant Function Case
III
16
Multilayered Machine

Piecewise linear classifiers are capable of
realizing complex decision surfaces
without using the expensive polynomial
transform network required by f-machine

17
Generalization of Linear Discriminant Functions
Quadratic Discriminant Functions

The quadratic discriminant function is obtained
by adding additional terms, wijxixj, involving
the products of pairs of component of x
xixjxjxi ? wijwji, with no loss in
generality
The quadratic discriminant function has an
additional d(d1)/2 coefficients ? more
complicated separating surface
The separating surface defined by g(x)0 is a
second-degree or hyperquadratic surface

18
Generalized Linear Discriminant Functions

The generalized linear discriminant function is
given as
a is a d-dimensional vector
the d-function yi(x) is arbitrary functions of x
(f-function)
yi(x) merely map points in the d-dimensional
x-space to points in d-dimensional y-space
The homogeneous discriminant function aty
separates points in this transformed space by a
hyperplane passing through the origin
The mapping from x to y reduces the problem to
one of finding a homogeneous linear discriminant
function

19
Example of Nonlinear Mapping

The mapping y 1, x, x2t takes a line and
transforms it to a parabola in three dimensions
A plane splits the resulting y-space into
regions corresponding to two categories, and this
in turn gives a non-simply connected decision
region in the one-dimensional x-space

20
Example of Nonlinear Mapping
21
Augmented Vector writing g(x) as aty
22
Two-Category Linearly Separable Case

In two-category case, suppose that...
y1,,yn a set of n samples
?1,?2 categories
g(x)aty a linear discriminant function
If there exists a weight vector a which
classifies all of the samples correctly, the
samples are said to be linearly separable
A sample yi is classified correctly if atyi gt0
and yi is labelled ?1, or if atyi lt0 and yi is
labelled ?2
Such a weight vector a is called a separating
vector or more generally a solution vector

23
Geometry and Terminology

The weight vector a can specify a point in weight
space
Each sample yi places a constraint on the
possible location of a solution vector
The equation atyi0 defines a hyperplane through
the origin of weight space having yi as a normal
vector
The solution vector must be on the positive side
of every hyperplane
A solution vector must lie in the intersection of
n half-spaces indeed any vector in this region
is a solution vector
The corresponding region is called the solution
region

24
Solution Region in Feature Space

The solution vectors leads to a plane that
separates the patterns from the two categories

25
Effect of Margin on Solution Region

Solution vector is not unique -gt
1. Find a unit-length weight vector that
maximizes the min. distancesamples,hypeplane
2. Find a unit-length weight vector satisfying
atyi gt b for all i, where b (a positive constant)
is called margin
The solution region resulting from the
intersections of the half-spaces for which atyi gt
b gt 0 lies within the previous solution region by
the distance b/yi

Margin b gt 0

Margin 0

26
Gradient Descent Procedure

Finding solution to the set of linear
inequalities atyigt0 define a criterion function
J(a) that is minimized when a is solution vector
Can be solved by gradient decent procedure
1. choose arbitrarily weight vector a(1) and
compute the gradient vector ?J(a(1))
2. obtain the next value a(2) by moving some
distance from a(1) in the direction of steepest
descent, i.e., along the negative of the gradient
In general, a(k1) is obtained from a(k) by the
equation

27
Basic Gradient Descent Procedure
28
Gradient Descent Newtons Algorithm
See p. 608
29
Gradient Descent Newtons Algorithm
30
Gradient Descent Algorithms

Sequence of weight vectors

Newtons method greater improvement in
convergence per step, even when using optimal
learning rates for both methods
However, the added computational burden of
inverting the Hessian matrix used is not always
justified, and simple gradient descent may suffice

31
Perceptron Criterion Function

The perceptron criterion function

32
Perceptron Learning Algorithm

The next weight vector is obtained by adding
scalar multiple of sum vector of the
misclassified samples

33
Perceptron Criterion Function
34
Squared Error Criterion Function
35
Squared Error Criterion Function
36
Minimum Squared Error Procedure
???? misclassified sample??? ???? criterion?
?????, ???? all sample?? ??
37
Minimum Squared Error Procedure

A simple closed-form solution can be found by
forming the gradient
Setting grad(Js(a))0,
where the d-by-n matrix Y is called
pseudo-inverse of Y
Note that YYI, but YY!I in general
If Y is defined more generally by Y
lim(YtYeI)1Yt
It can be shown that this limit always exists
and, aYb is an MSE solution to Yab