Introduction to Neural Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Neural Networks

Description:

Can handle weak (desirable, but not required) constraints. ... ui1 connected to ukn with -dik. U11. Un1. Unn. U1n. b -p. Algorithm. 1. Initialize weights b, p ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 34
Provided by: Joh7
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Neural Networks


1
Introduction to Neural Networks
  • John Paxton
  • Montana State University
  • Summer 2003

2
Chapter 7 A Sampler Of Other Neural Nets
  • Optimization Problems
  • Common Extensions
  • Adaptive Architectures
  • Neocognitron

3
I. Optimization Problems
  • Travelling Salesperson Problem.
  • Map coloring.
  • Job shop scheduling.
  • RNA secondary structure.

4
Advantages of Neural Nets
  • Can find near optimal solutions.
  • Can handle weak (desirable, but not required)
    constraints.

5
TSP Topology
  • Each row has 1 unit that is on
  • Each column has 1 unit that is on

1st 2nd 3rd



City A City B City C
6
Boltzmann Machine
  • Hinton, Sejnowski (1983)
  • Can be modelled using Markov chains
  • Uses simulated annealing
  • Each row is fully interconnected
  • Each column is fully interconnected

7
Architecture
  • ui,j connected to uk,j1 with di,k
  • ui1 connected to ukn with -dik

b
-p
U11
U1n
Un1
Unn
8
Algorithm
  • 1. Initialize weights b, p p gt b p gt
    greatest distance between cities Initialize
    temperature T
  • Initialize activations of units to random
    binary values

9
Algorithm
  • 2. while stopping condition is false, do steps
    3 8
  • 3. do steps 4 7 n2 times (1 epoch)
  • 4. choose i and j randomly 1 lt i, j lt n
    uij is candidate to change state

10
Algorithm
  • 5. Compute c 1 2uijb S S ukm (-p)
  • where k ltgt i, m ltgt j
  • 6. Compute probability to accept change
  • a 1 / (1 e(-c/T) )
  • 7. Accept change if random number 0..1
  • lt a. If change, uij 1 uij
  • 8. Adjust temperature T .95T

11
Stopping Condition
  • No state change for a specified number of epochs.
  • Temperature reaches a certain value.

12
Example
  • T(0) 20
  • ½ units are on initially
  • b 60
  • p 70
  • 10 cities, all distances less than 1
  • 200 or fewer epochs to find stable configuration
    in 100 random trials

13
Other Optimization Architectures
  • Continuous Hopfield Net
  • Gaussian Machine
  • Cauchy Machine
  • Adds noise to input in attempt to escape from
    local minima
  • Faster annealing schedule can be used as a
    consequence

14
II. Extensions
  • Modified Hebbian Learning
  • Find parameters for optimal surface fit of
    training patterns

15
Boltzmann Machine With Learning
  • Add hidden units
  • 2-1-2 net below could be used for simple
    encoding/decoding (data compression)

y1
x1
z1
x2
y2
16
Simple Recurrent Net
  • Learn sequential or time varying patterns
  • Doesnt necessarily have steady state output
  • input units
  • context units
  • hidden units
  • output units

17
Architecture
c1
x1
z1
y1
xn
zp
ym
cp
18
Simple Recurrent Net
  • f(ci(t)) f(zi(t-1))
  • f(ci(0)) 0.5
  • Can use backpropagation
  • Can learn string of characters

19
Example Finite State Automaton
  • 4 xi
  • 4 yi
  • 2 zi
  • 2 ci

A
BEGIN
END
B
20
Backpropagation In Time
  • Rumelhart, Williams, Hinton (1986)
  • Application Simple shift register

1 (fixed)
x1
y1
x1
z1
x2
y2
x2
1 (fixed)
21
Backpropagation Training for Fully Recurrent Nets
  • Adapts backpropagation to arbitrary connection
    patterns.

22
III. Adaptive Architectures
  • Probabilistic Neural Net (Specht 1988)
  • Cascade Correlation (Fahlman, Lebiere 1990)

23
Probabilistic Neural Net
  • Builds its own architecture as training
    progresses
  • Chooses class A over class B if hAcAfA(x) gt
    hBcBfB(x)
  • cA is the cost of classifying an example as
    belonging to A when it belongs to B
  • hA is the a priori probability of an example
    belonging to class A

24
Probabilistic Neural Net
  • fA(x) is the probability density function for
    class A, fA(x) is learned by the net
  • zA1 pattern unit, fA summation unit

zA1
fA
x1
zAj
y
zB1
fB
xn
zBk
25
Cascade Correlation
  • Builds own architecture while training progresses
  • Tries to overcome slow rate of convergence by
    other neural nets
  • Dynamically adds hidden units (as few as
    possible)
  • Trains one layer at a time

26
Cascade Correlation
  • Stage 1

x0
y1
x1
y2
x2
27
Cascade Correlation
  • Stage 2 (fix weights into z1)

x0
y1
x1
z1
y2
x2
28
Cascade Correlation
  • Stage 3 (fix weights into z2)

x0
y1
z1
z2
x1
y2
x2
29
Algorithm
  • 1. Train stage 1. If error is not acceptable,
    proceed.
  • 2. Train stage 2. If error is not
  • acceptable, proceed.
  • 3. Etc.

30
IV. Neocognitron
  • Fukushima, Miyako, Ito (1983)
  • Many layers, hierarchical
  • Very spare and localized connections
  • Self organizing
  • Supervised learning, layer by layer
  • Recognizes handwritten 0, 1, 2, 3, 9,
    regardless of position and style

31
Architecture
Layer of Arrays Size
Input 1 192
S1 / C1 12 / 8 192 / 112
S2 / C2 38 / 22 112 / 72
S3 / C3 32 / 30 72 / 72
S4 / C4 16 / 10 32 / 12
32
Architecture
  • S layers respond to patterns
  • C layers combine results, use larger field of
    view
  • For example S11 responds to 0 0 0 1 1 1 0 0 0

33
Training
  • Progresses layer by layer
  • S1 connections to C1 are fixed
  • C1 connections to S2 are adaptable
  • A V2 layer is introduced between C1 and S2, V2 is
    inhibatory
  • C1 to V2 connections are fixed
  • V2 to S2 connections are adaptable
Write a Comment
User Comments (0)
About PowerShow.com