Linear Discriminators - PowerPoint PPT Presentation

About This Presentation
Title:

Linear Discriminators

Description:

For any labelling of xi by classes /-, the embedding makes data linearly separable. ... Let class yi be 1 or -1. Let Err = sum(W*Xi Yi) where Xi is ith example. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 14
Provided by: ics5
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Linear Discriminators


1
Linear Discriminators
  • Chapter 20
  • From Data to Knowledge

2
Concerns
  • Generalization Accuracy
  • Efficiency
  • Noise
  • Irrelevant features
  • Generality when does this work?

3
Linear Model
  • Let f1, fn be the feature values of an example.
  • Let class be denoted 1, -1.
  • Define f0 -1. (bias weight)
  • Linear model defines weights w0,w1,..wn.
  • -w0 is the threshold
  • Classification rule
  • If w0f0w1f1..wnfngt 0, predict class else
    predict class -.
  • Briefly WFgt0 where is inner product of
    weight vector and feature weights and F has been
    augmented with extra 1.

4
Augmentation Trick
  • Suppose data defined features f1 and f2.
  • 2 f1 3f2 gt 4 is classifier
  • Equivalently lt4,2,3gt lt-1,f1,f2gt gt 0
  • Mapping data ltf1,f2gt to lt-1,f1,f2gt allows
    learning/representing threshold as just another
    featuer.
  • Mapping data into higher dimensions is key idea
    behind SVMs

5
Mapping to enable Linear Separation
  • Let xi be m vectors in RN.
  • Map xi into RNM by xi -gt ltxi,0,..1,0..gt where
    1 in ni position.
  • For any labelling of xi by classes /-, the
    embedding makes data linearly separable.
  • Define wi 0 iltN
  • w(in) 1 if xi is else 0.
  • W(in) -1 if xi is negative else 0.

6
Representational Power
  • Or of n features
  • Wi 1, threshold 0
  • And of n features
  • Wi 1 threshold n -1
  • K of n features (prototype)
  • Wi 1 threshold k -1
  • Cant do XOR
  • Combining linear threshold units yields any
    boolean function.

7
Classical Perceptron
  • Goal Any W which separates the data.
  • Algorithm (X is augmented with 1)
  • W 0
  • Repeat
  • If X positive and WX wrong, W WX
  • Else if X negative WX wrong, W W-X.
  • Until no errors or very large number of times.

8
Classical Perceptron
  • Theorem If concept linearly separable, then
    algorithm finds a solution.
  • Training time can be exponential in number of
    features.
  • Epoch is single pass through entire data.
  • Convergence can take exponentially many epochs.
  • If xiltR and margin m, then number of mistake
    is lt R2/m2.

9
Neural Net view
  • Goal minimize Squared-error Err2.
  • Let class yi be 1 or -1.
  • Let Err sum(WXi Yi) where Xi is ith example.
  • This is a function only of the weights.
  • Use Calculus take partial derivates wrt Wj.
  • To move to lower value, move in direction of
    negative gradient, i.e.
  • change in Xi is -2ErrXj

10
Neural Net View
  • This is an optimization problem.
  • The solution is by hill-climbing so there is no
    guarantee of finding the optimal solution.
  • While derivates tell you the direction (the
    negative gradient) they do not tell you how much
    to change each Xi.
  • On the plus side it is fast.
  • On the negative side, no guarantee of separation

11
Support Vector Machine
  • Goal maximize the margin.
  • Assuming the line separates the data, the margin
    is the minimum of the closest positive and
    negative example to the line.
  • Good News This can be solved by quadratic
    program.
  • Implemented in Weka as SOM.
  • If not linearly separable, SVM will add more
    features.

12
If not Linearly Separable
  1. Add more nodes Neural Nets
  2. Can Represent any boolean function why?
  3. No guarantees about learning
  4. Slow
  5. Incomprehensible
  6. Add more features SVM
  7. Can represent any boolean function
  8. Learning guarantees
  9. Fast
  10. Semi-comprehensible

13
Adding features
  • Suppose pt (x,y) is positive if it lies in the
    unit disk else negative.
  • Clearly very unlinearly separable
  • Map (x,y) -gt (x,y, x2y2)
  • Now in 3-space, easily separable.
  • This works for any learning algorithm, but SVM
    will almost do it for you. (set parameters).
Write a Comment
User Comments (0)
About PowerShow.com