4' Closed book - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

4' Closed book

Description:

Decision functions (discriminant functions) are linear functions of x ... computational burden of inverting the Hessian matrix used is not always ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 42
Provided by: bmeYon
Category:
Tags: book | closed | hessian

less

Transcript and Presenter's Notes

Title: 4' Closed book


1
???? ??
  • 1. ?? 1- 5 ?
  • 2. ?? 4/26, 4/29-31 ? ??
  • 3. ???? ?? 2??
  • 4. Closed book

2
Ch. 5. Linear Discriminant Functions
3
Linear Discriminant Functions
  • Decision functions (discriminant functions) are
    linear functions of x
  • Easy to compute attractive candidates for
    initial, trial classifiers
  • Finding a linear discriminant functions training
    - minimizing a criterion function

4
Linear Discriminant Functions
  • Discriminant function linear combination of the
    components of x
  • where, w is the weight vector, and w0 is the
    bias or threshold weight
  • A two category linear classifier have the
    decision rule

5
Linear Classifier
  • Simple linear classifier having d input units,
    each of which corresponds to the values of the
    components of an input vector
  • The output unit sums all these products, and
    emits 1 if w0 wtxgt 0, or -1 otherwise

6
Decision Surface
  • The decision surface separates points assigned to
    ?1 from points assigned to ?2
  • When g(x) is linear, this decision surface is
    hyperplane H
  • If x1 and x2 are both on the decision surface,
    then

w is normal to a vector x lying in the
hyperplane
7
Decision Surface Hyperplane
8
Decision Surface Hyperplane
9
Decision Surface Hyperplane
10
Multicategory Linear Discriminant Function
  • Four-class problem
  • ?i/not ?i dichotomies
  • (c two-class problems)
  • ?i/?j dichotomies
  • c(c-1)/2 two-class problems

Ambiguous region
11
Multi-Category Linear Discriminant Function
Linear Machine
  • The c linear discriminant functions are given by
  • The decision function for multi-category, linear
    machine
  • Assigns x to ?i if gi (x)gtgj(x) for all ji
  • Divides the feature space into c decision regions
  • The boundary between Ri and Rj is a portion of
    the hyperplane Hij, defined by
  • The weight vector w wi - wj is normal to Hij
  • The signed distance from x to Hij is given by
  • (gi(x)-gj(x))/ wi- wj

12
Multicategory Linear Discriminant Function
  • Three-class problem
  • Four-class problem

Ambiguous region
13
Multicategory Linear Discriminant Function Case I
14
Multicategory Linear Discriminant Function Case
II
15
Multicategory Linear Discriminant Function Case
III
16
Multilayered Machine
  • Piecewise linear classifiers are capable of
    realizing complex decision surfaces
  • without using the expensive polynomial
    transform network required by f-machine

17
Generalization of Linear Discriminant Functions
Quadratic Discriminant Functions
  • The quadratic discriminant function is obtained
    by adding additional terms, wijxixj, involving
    the products of pairs of component of x
  • xixjxjxi ? wijwji, with no loss in
    generality
  • The quadratic discriminant function has an
    additional d(d1)/2 coefficients ? more
    complicated separating surface
  • The separating surface defined by g(x)0 is a
    second-degree or hyperquadratic surface

18
Generalized Linear Discriminant Functions
  • The generalized linear discriminant function is
    given as
  • a is a d-dimensional vector
  • the d-function yi(x) is arbitrary functions of x
    (f-function)
  • yi(x) merely map points in the d-dimensional
    x-space to points in d-dimensional y-space
  • The homogeneous discriminant function aty
    separates points in this transformed space by a
    hyperplane passing through the origin
  • The mapping from x to y reduces the problem to
    one of finding a homogeneous linear discriminant
    function

19
Example of Nonlinear Mapping
  • The mapping y 1, x, x2t takes a line and
    transforms it to a parabola in three dimensions
  • A plane splits the resulting y-space into
    regions corresponding to two categories, and this
    in turn gives a non-simply connected decision
    region in the one-dimensional x-space

20
Example of Nonlinear Mapping
21
Augmented Vector writing g(x) as aty
22
Two-Category Linearly Separable Case
  • In two-category case, suppose that...
  • y1,,yn a set of n samples
  • ?1,?2 categories
  • g(x)aty a linear discriminant function
  • If there exists a weight vector a which
    classifies all of the samples correctly, the
    samples are said to be linearly separable
  • A sample yi is classified correctly if atyi gt0
    and yi is labelled ?1, or if atyi lt0 and yi is
    labelled ?2
  • Such a weight vector a is called a separating
    vector or more generally a solution vector

23
Geometry and Terminology
  • The weight vector a can specify a point in weight
    space
  • Each sample yi places a constraint on the
    possible location of a solution vector
  • The equation atyi0 defines a hyperplane through
    the origin of weight space having yi as a normal
    vector
  • The solution vector must be on the positive side
    of every hyperplane
  • A solution vector must lie in the intersection of
    n half-spaces indeed any vector in this region
    is a solution vector
  • The corresponding region is called the solution
    region

24
Solution Region in Feature Space
  • The solution vectors leads to a plane that
    separates the patterns from the two categories

25
Effect of Margin on Solution Region
  • Solution vector is not unique -gt
  • 1. Find a unit-length weight vector that
    maximizes the min. distancesamples,hypeplane
  • 2. Find a unit-length weight vector satisfying
    atyi gt b for all i, where b (a positive constant)
    is called margin
  • The solution region resulting from the
    intersections of the half-spaces for which atyi gt
    b gt 0 lies within the previous solution region by
    the distance b/yi
  • Margin b gt 0
  • Margin 0

26
Gradient Descent Procedure
  • Finding solution to the set of linear
    inequalities atyigt0 define a criterion function
    J(a) that is minimized when a is solution vector
  • Can be solved by gradient decent procedure
  • 1. choose arbitrarily weight vector a(1) and
    compute the gradient vector ?J(a(1))
  • 2. obtain the next value a(2) by moving some
    distance from a(1) in the direction of steepest
    descent, i.e., along the negative of the gradient
  • In general, a(k1) is obtained from a(k) by the
    equation

27
Basic Gradient Descent Procedure
28
Gradient Descent Newtons Algorithm
See p. 608
29
Gradient Descent Newtons Algorithm
30
Gradient Descent Algorithms
  • Sequence of weight vectors
  • Newtons method greater improvement in
    convergence per step, even when using optimal
    learning rates for both methods
  • However, the added computational burden of
    inverting the Hessian matrix used is not always
    justified, and simple gradient descent may suffice

31
Perceptron Criterion Function
  • The perceptron criterion function

32
Perceptron Learning Algorithm
  • The next weight vector is obtained by adding
    scalar multiple of sum vector of the
    misclassified samples

33
Perceptron Criterion Function
34
Squared Error Criterion Function
35
Squared Error Criterion Function
36
Minimum Squared Error Procedure
???? misclassified sample??? ???? criterion?
?????, ???? all sample?? ??
37
Minimum Squared Error Procedure
  • A simple closed-form solution can be found by
    forming the gradient
  • Setting grad(Js(a))0,
  • where the d-by-n matrix Y is called
    pseudo-inverse of Y
  • Note that YYI, but YY!I in general
  • If Y is defined more generally by Y
    lim(YtYeI)1Yt
  • It can be shown that this limit always exists
    and, aYb is an MSE solution to Yab

38
Example by Matrix Pseudo-inverse
39
Widrow-Hoff Rule / LMS Procedure
40
LMS Procedure
41
Example for LMS Algorithm
  • Find the decision surface using LMSE algorithm
  • The augmented patterns are
  • ?1 (0,0,0,0)',(1,0,0,1)',(1,0,1,1,)',(1,1,0,1)'
  • ?2 (0,0,1,1)',(0,1,0,1)',(0,1,1,1,)',(1,1,1,1)'
  • Letting a(1)0, ?(k) 1/k
  • Solution a (0.135,-0.238,-0.305,0.721)'
Write a Comment
User Comments (0)
About PowerShow.com