Introduction to Neural Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Neural Networks

Description:

Recall is no longer perfect. 1 -1 1 1 1 1 0 1 3. 1 -1 -1 -1 -1 -1 = 1 0 1. 1 -1 1 1 ... wii = 0. Characteristics. Only 1 unit updates its activation at a time ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 38
Provided by: Joh7
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Neural Networks


1
Introduction to Neural Networks
  • John Paxton
  • Montana State University
  • Summer 2003

2
Chapter 3 Pattern Association
  • Aristotles observed that human memory associates
  • similar items
  • contrary items
  • items close in proximity
  • items close in succession (a song)

3
Terminology and Issues
  • Autoassociative Networks
  • Heteroassociative Networks
  • Feedforward Networks
  • Recurrent Networks
  • How many patterns can be stored?

4
Hebb Rule for Pattern Association
  • Architecture

w11
x1
y1
xn
ym
wnm
5
Algorithm
  • 1. set wij 0 1 lt i lt n, 1 lt j lt m
  • 2. for each training pair st
  • 3. xi si
  • 4. yj tj
  • 5. wij(new) wij(old) xiyj

6
Example
  • s1 (1 -1 -1), s2 (-1 1 1)
  • t1 (1 -1), t2 (-1 1)
  • w11 11 (-1)(-1) 2
  • w12 1(-1) (-1)1 -2
  • w21 (-1)1 1(-1) -2
  • w22 (-1)(-1) 1(1) 2
  • w31 (-1)1 1(-1) -2
  • w32 (-1)(-1) 11 2

7
Matrix Alternative
  • s1 (1 -1 -1), s2 (-1 1 1)
  • t1 (1 -1), t2 (-1 1)
  • 1 -1 1 -1 2 -2
  • -1 1 -1 1 -2 2
  • -1 1 -2 2

8
Final Network
  • f(yin) 1 if yin gt 0, 0 if yin 0, else -1

2
x1
y1
-2
-2
x2
2
y2
-2
x3
2
9
Properties
  • Weights exist if input vectors are linearly
    independent
  • Orthogonal vectors can be learned perfectly
  • High weights imply strong correlations

10
Exercises
  • What happens if (-1 -1 -1) is tested? This
    vector has one mistake.
  • What happens if (0 -1 -1) is tested? This vector
    has one piece of missing data.
  • Show an example of training data that is not
    learnable. Show the learned network.

11
Delta Rule for Pattern Association
  • Works when patterns are linearly independent but
    not orthogonal
  • Introduced in the 1960s for ADALINE
  • Produces a least squares solution

12
Activation Functions
  • Delta Rule (1) wij(new) wij(old) a(tj
    yj)xi1
  • Extended Delta Rule (f(yin.j)) wij(new)
    wij(old) a(tj yj)xif(yin.j)

13
Heteroassociative Memory Net
  • Application Associate characters.A lt-gt aB
    lt-gt b

14
Autoassociative Net
  • Architecture

w11
x1
y1
xn
yn
wnn
15
Training Algorithm
  • Assuming that the training vectors are
    orthogonal, we can use the Hebb rule algorithm
    mentioned earlier.
  • Application Find out whether an input vector is
    familiar or unfamiliar. For example, voice input
    as part of a security system.

16
Autoassociate Example
  • 1 1 1 1 1 1 1 0 1 1
  • 1 1 1 1 1 0
    1
  • 1 1 1 1 1 1
    0

17
Evaluation
  • What happens if (1 1 1) is presented?
  • What happens if (0 1 1) is presented?
  • What happens if (0 0 1) is presented?
  • What happens if (-1 1 1) is presented?
  • What happens if (-1 -1 1) is presented?
  • Why are the diagonals set to 0?

18
Storage Capacity
  • 2 vectors (1 1 1), (-1 -1 -1)
  • Recall is perfect
  • 1 -1 1 1 1 0 2 2
  • 1 -1 -1 -1 -1 2 0 2
  • 1 -1 2 2 0

19
Storage Capacity
  • 3 vectors (1 1 1), (-1 -1 -1), (1 -1 1)
  • Recall is no longer perfect
  • 1 -1 1 1 1 1 0 1 3
  • 1 -1 -1 -1 -1 -1 1 0 1
  • 1 -1 1 1 -1 1 3 1 0

20
Theorem
  • Up to n-1 bipolar vectors of n dimensions can be
    stored in an autoassociative net.

21
Iterative Autoassociative Net
  • 1 vector s (1 1 -1)
  • st s 0 1 -1 1 0 -1
    -1 -1 0
  • (1 0 0) -gt (0 1 -1)
  • (0 1 -1) -gt (2 1 -1) -gt (1 1 -1)
  • (1 1 -1) -gt (2 2 -2) -gt (1 1 -1)

22
Testing Procedure
  • 1. initialize weights using Hebb learning
  • 2. for each test vector do
  • 3. set xi si
  • 4. calculate ti
  • 5. set si ti
  • 6. go to step 4 if the s vector is new

23
Exercises
  • 1 piece of missing data (0 1 -1)
  • 2 pieces of missing data (0 0 -1)
  • 3 pieces of missing data (0 0 0)
  • 1 mistake (-1 1 -1)
  • 2 mistakes (-1 -1 -1)

24
Discrete Hopfield Net
  • content addressable problems
  • pattern association problems
  • constrained optimization problems
  • wij wji
  • wii 0

25
Characteristics
  • Only 1 unit updates its activation at a time
  • Each unit continues to receive the external
    signal
  • An energy (Lyapunov) function can be found that
    allows the net to converge, unlike the previous
    system
  • Autoassociative

26
Architecture
x2
y2
y1
y3
x1
x3
27
Algorithm
  • 1. initialize weights using Hebb rule
  • 2. for each input vector do
  • 3. yi xi
  • 4. do steps 5-6 randomly for each yi
  • 5. yin.i xi Syjwji
  • 6. calculate f(yin.i)
  • 7. go to step 2 if the net hasnt converged

28
Example
  • training vector (1 -1)

y1
y2
-1
x1 x2
29
Example
  • input (0 -1) update y1 0 (-1)(-1)
    1 update y2 -1 1(-1) -2 -gt -1
  • input (1 -1) update y2 -1 1(-1) -2 -gt
    -1 update y1 1 -1(-1) 2 -gt 1

30
Hopfield Theorems
  • Convergence is guaranteed.
  • The number of storable patterns is approximately
    n / (2 log n) where n is the dimension of a
    vector

31
Bidirectional Associative Memory (BAM)
  • Heteroassociative Recurrent Net
  • Kosko, 1988
  • Architecture

x1
y1
ym
xn
32
Activation Function
  • f(yin) 1, if yin gt 0
  • f(yin) 0, if yin 0
  • f(yin) -1 otherwise

33
Algorithm
  • 1. initialize weights using Hebb rule
  • 2. for each test vector do
  • 3. present s to x layer
  • 4. present t to y layer
  • 5. while equilibrium is not reached
  • 6. compute f(yin.j)
  • 7. compute f(xin.j)

34
Example
  • s1 (1 1), t1 (1 -1)
  • s2 (-1 -1), t2 (-1 1)
  • 1 -1 1 -1 2 -2
  • 1 -1 -1 1 2 -2

35
Example
  • Architecture

2
x1
y1
-2 2
y2
x2
-2
present (1 1) to x -gt 1 -1 present (1 -1) to y
-gt 1 1
36
Hamming Distance
  • Definition Number of different corresponding
    bits in two vectors
  • For example, H(1 -1), (1 1) 1
  • Average Hamming Distance is ½.

37
About BAMs
  • Observation Encoding is better when the average
    Hamming distance of the inputs is similar to the
    average Hamming distance of the outputs.
  • The memory capacity of a BAM is min(n-1, m-1).
Write a Comment
User Comments (0)
About PowerShow.com