Chapter 4: Artificial Neural Networks - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Chapter 4: Artificial Neural Networks

Description:

Chapter 4: Artificial Neural Networks Artificial neural network(ANN) General, practical method for learning real-valued, discrete-valued, vector-valued functions from ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 47
Provided by: csSungsh
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4: Artificial Neural Networks


1
Chapter 4 Artificial Neural Networks
2
  • Artificial neural network(ANN)
  • General, practical method for learning
    real-valued, discrete-valued, vector-valued
    functions from examples
  • BACPROPAGATION ????
  • Use gradient descent to tune network parameters
    to best fit a training set of input-output pairs
  • ANN learning
  • Training example? error? ???.
  • Interpreting visual scenes, speech recognition,
    learning robot control strategy

3
Biological motivation
  • ????? ???? ???
  • ?? ??(parallel computing)
  • ?? ??(distributed representation)
  • ????? ???? ???
  • ?? ??(??)? ??

4
ALVINN system
5
??? ??? ??? ??
  • ???? ?? ??? ?? ?? ??? ?? ???? ??
  • ?? ??? ??? ??? ??? ?? ?? ? ??.
  • ?? ??? ??(noise)? ??? ???
  • ? ?? ??
  • ?? ??? ??? ??
  • ??? ??? ??? ???? ?? ???? ??

6
Perceptrons
  • vector of real-valued input
  • weights threshold
  • learning choosing values for the weights

7
  • Perceptron learning? hypotheses space
  • n input vector? ??

8
Perceptron? ???
  • linearly separable example? ?? hyperplane
    decision surface
  • many boolean functions(XOR ??)
  • m-of-n function
  • disjunctive normal form ??? unit

9
Perceptron rule
  • ???? ?? ? ??? ???? ????? ????? ? ??
  • training example? linearly separable
  • ??? ?? learning rate

10
Gradient descent Delta rule
  • for non-linearly separable
  • unthresholded
  • od ? w? ?? ???

11
Hypethesis space
12
Gradient descent
  • gradient steepest increase in E

13
(No Transcript)
14
Gradient descent(contd)
  • Training example? linearly separable ??? ???? ???
    global minimum? ???.
  • Learning rate? ? ?? overstepping? ?? -gt learning
    rate? ????? ??? ??? ????? ??.

15
Stochastic approximation to gradient descent
  • Gradient descent? ???? ??
  • hypothesis space is continuously parameterized
  • error? hypothesis parameter? ?? ?? ???? ??.
  • Gradient descent? ??
  • ??? ?? ???.
  • ??? local minima? ???? ??

16
Stochasticapproximation togradient
descent(contd)
  • ??? training example? ???? E? ??? ?? weight?
    ????.
  • ??? descent gradient? ??
  • ?? ?? learning rate? ??
  • multiple local minima? ?? ???? ??.
  • Delta rule

17
Remark
  • Perceptron rule
  • thresholded output
  • ??? weight
  • linearly separable
  • Delta rule
  • unthresholded output
  • ????? ??? ????? weight
  • non-linearly separable

18
Multilayer networks
  • Nonlinear decision surface

19
Differential threshold unit
  • Sigmoid function
  • nonlinear, differentiable

20
i
j(h)
k
o1
o1
net1
net1
x1
x21
w12
w21
o2
o2
net2
x2
w22
net2
w22
x22
w23
w32
x23
o3
o3
x3
net3
net3
21
(No Transcript)
22
BACKPROPAGATION????
  • ??? error? ??

23
BACKPROPAGATION????(contd)
  • Multiple local minima
  • Termination
  • fixed number of iteration
  • error threshold
  • error of separate validation set

24
BACKPROPAGATION????(contd)
  • Adding momentum
  • ??? loop??? weight ??? ??? ??
  • Learning in arbitrary acyclic network
  • downstream(r)

25
BACKPROPAGATION rule
26
BACKPROPAGATION rule(contd)
  • Training rule for output unit

27
i
j(h)
k
o1
o1
net1
net1
x1
x21
w12
w21
o2
o2
net2
x2
w22
net2
w22
x22
w23
w32
x23
o3
o3
x3
net3
net3
28
BACKPROPAGATION rule(contd)
  • Training rule for hidden unit

29
(No Transcript)
30
Convergence and local minima
  • Only guarantee local minima
  • This problem is not severe
  • Algorithm is highly effective
  • the more weights, the less local minima problem
  • weight? ??? 0? ??? ??? ???
  • ???
  • momentum, stochastic, ??? network

31
Feedfoward network? ???
  • Boolean functions
  • with two layers
  • disjunctive normal form
  • ??? ??? ??? hidden unit
  • Continuous functions(bounded)
  • with two layers
  • Arbitrary functions
  • with three layers
  • linear combination of small functions

32
  • Hypothesis space search
  • continuous -gt distinct?? ??
  • Inductive bias
  • characterize? ???
  • ??? interpolation

33
Hidden layer representation
  • ??? ?? ??? ??? ???? hidden layer? ???? ??? ??.
  • ??? ?? ?? ? feature?? ???? ???? ???? ?? ? ? ??
    ??? ????? ????.

34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Generalization, overfitting, stopping criterion
  • Terminating condition
  • error threshold? ??
  • Generalization accuracy? ??
  • Weight decay
  • Validation data
  • Cross-validation approach
  • K-fold cross-validation

39
(No Transcript)
40
Face recognition
  • for non-linearly separable
  • unthresholded
  • od ? w? ?? ???

41
  • Input image120128 -gt3032
  • ???? ??? ??
  • mean value(cf, ALVINN)
  • 1-of-n output encoding
  • many weights
  • ??? ??? ??
  • lt0.9, 0.1, 0.1, 0.1gt
  • 2 layers, 3 units -gt 90 success
  • learned hidden units

42
Alternativce error functions
  • Weight-tuning rule? ??? ????? ???? ?? ??
  • Penalty term for weight magnitude
  • reducing the risk of overfitting
  • Derivative of target function
  • Minimizing cross-entropy
  • for probabilistic function
  • Weight sharing
  • speech recognition

43
(No Transcript)
44
Alternative error minimization procedures
  • Line search
  • direction same as backpropagation
  • distance minimum of the error function in this
    line
  • very large or very small
  • Conjugate gradient
  • new direction component of the error gradient
    remains zero

45
Recurrent networks
46
Dynamically modifying network structure
  • ?? ???? ???? ?? ??? ??
  • ??(without hidden unit)
  • CASCADE-CORRELATION
  • ?? ?? ??, overfitting ??
  • ??
  • optimal brain damage
  • ?? ?? ??
Write a Comment
User Comments (0)
About PowerShow.com