Artificial Neural Networks - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Artificial Neural Networks

Description:

Robust to errors in training data. ... 1 e-net. sigmoid unit. 27. Multilayer Networks. Sigmoid unit: (y)/ y = (y).(1 - (y)) 28 ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 43

Provided by: caohoa

Category:

more less

Transcript and Presenter's Notes

Title: Artificial Neural Networks

1
Artificial Neural Networks

Learning real-valued, discrete-valued, and
vector-valued functions from examples.
Robust to errors in training data.
Applications interpreting visual scenes, speech
recognition, robot control strategies

2
Biological Motivation

Human brain 1011 neurons
Each connected to 104 others
Switching time 10-3 seconds
(Computer switching speed 10-10 seconds)
It requires 10-1 seconds to recognize a human
face
? highly parallel and distributed processes

3
Biological Motivation

ANN model is not the same as that of biological
neural systems
Using ANNs to study and model biological learning
processes
Obtaining highly effective machine learning
algorithms

4
ANN Representation
.......
.......
5
Appropriate Problems for ANNs

Instances are represented by many attribute-value
pairs
Target function output may be discrete-valued,
real-valued, vector-valued
Training examples can contain errors
Long training time is acceptable
Fast evaluation of the learned target function
may be required
Understanding the learned target concept is not
important

6
Perceptrons
w1
x1
x0 1
w0
w2
x2
..........
? wixi
1 if ? wixi ? 0 o -1 otherwise
wn
xn
7
Perceptrons
x2
x0 1
w0
-

w1
x1
x1
? wixi
-

xn
w2
1 if ? wixi ? 0 o -1 otherwise
A ??B
8
Perceptron Training Rule

wi ? wi ?wi
?wi ?(t o)xi
t target output of the current training example
o the thresholded output generated by the
perceptron
? learning rate (positive constant)

9
Perceptron Training Rule
wi ? wi ?wi ?wi (t o)xi
10
Perceptron Training Rule
x2
x2

-

x1
x1
-
-

-
linearly separable
non linearly separable
11
Perceptron Training Rule

The learning procedure converges to a weight
vector that correctly classifies all linearly
separable training examples

12
Perceptron Training Rule

Minsky, M. Papert, S. (1969). Perceptrons.
MIT Press.

13
Gradient Descent Rule
w1
x1
x1 1
w0
w2
x2
..........
? wixi
1 if ? wixi ? 0 o -1 otherwise
wn
xn
linear unit
14
Gradient Descent Rule

Training error
E(w) ?d?D(td od)2/2
td target output of training example d
od the unthresholded output for d ( w. x)

15
Gradient Descent Rule
16
Gradient Descent Rule

Gradient of E (steepest increase direction)
?E(w) ?E/?w0, ?E/?w1, ... , ?E/?wn
w ? w ?w
w ? w ??E(w)

17
Gradient Descent Rule

wi ? wi ?wi
?wi -??E/?wi
?E/?wi -?d?D(td od)xid
?wi ??d?D(td od)xid

18
Gradient Descent Rule

Converging to a local minimum can be quite slow
No guarantee to converge to the global minimum

19
Stochastic Approximation

Delta rule
?wi ?(td od)xid
E(w) (td od)2/2

20
Stochastic Approximation

Weights are updated upon examining each training
example
Less computation per weight update step is
required
Falling into local minima can be avoided

21
Stochastic Approximation

The delta rule converges towards a best-fit
approximation to the target concept, regardless
of whether the training data are linearly
separable

22
Multilayer Networks

Single perceptrons can express only linear
decision surfaces
A multilayer network can represent highly
nonlinear decision surfaces

23
Multilayer Networks
head
hid
who'd
hood
.......
.......
F1
F2
24
Multilayer Networks
25
Multilayer Networks

What type of unit ?
Perceptrons non-differentiable
Linear units only linear functions
....

26
Multilayer Networks
w1
x1
x1 1
w0
w2
x2
..........
1 o ?(net) ???? 1 e-net
wn
net ? wixi
xn
sigmoid unit
27
Multilayer Networks

Sigmoid unit
??(y)/?y ?(y).(1 - ?(y))

28
Backpropagation Algorithm

Training error
E(w) ?d?D ?k?outputs (tkd okd)2/2

29
Backpropagation Algorithm
oh
ok
hid h
out k
in i
whi
wkh
?k
?h
?k ok(1 ok)(tk ok) ?h oh(1
oh)?kwkh?k wji ? wji ??jxji
30
Backpropagation Algorithm
xji
j
i
wji
?j
wji ? wji ??jxji
31
Backpropagation Algorithm

Adding momentum
?wji(n) ??jxji a?wji(n - 1)
iteration momentum
Keeping the search direction ? passing small
local minima
Increasing the search step size ? speeding
convergence

32
Backpropagation Algorithm

Learning in arbitrary acyclic networks

layer m
m1
or
os
r
s
wsr
?s
?r
?r or(1 or)?s?layer m1wsr?s
33
Backpropagation Algorithm

Convergence and local minima
Not guaranteed to converge towards the global
minimum error, but highly effective in practice
Approximately linear when the weights are close
to 0, hence passing local minima of non-linear
functions

34
Backpropagation Algorithm

Heuristics to alleviate the local minima problem
Add a momentum term to the weight-update rule
Use stochastic gradient descent rather than true
gradient descent
Train multiple networks using the same data, but
initializing each network with different random
weights

35
Backpropagation Algorithm