Announcements - PowerPoint PPT Presentation

About This Presentation
Title:

Announcements

Description:

One sigmoid unit - node. Multilayer networks of sigmoid units. CS 484 ... Error gradient for the sigmoid function. CS 484 Artificial Intelligence. 31 ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 30
Provided by: dlaw3
Learn more at: http://www.cs.loyola.edu
Category:

less

Transcript and Presenter's Notes

Title: Announcements


1
Announcements
  • Homework 5 due today, October 30
  • Book Review due today, October 30
  • Lab 3 due Thursday, November 1
  • Homework 6 due Tuesday, November 6
  • Current Event
  • Kay - today
  • Chelsea - Thursday, November 1

2
Neural Networks
  • Lecture 12

3
Artificial Neural Networks
  • Artificial neural networks (ANNs) provide a
    practical method for learning
  • real-valued functions
  • discrete-valued functions
  • vector-valued functions
  • Robust to errors in training data
  • Successfully applied to such problems as
  • interpreting visual scenes
  • speech recognition
  • learning robot control strategies

4
Biological Neurons
  • The human brain is made up of billions of simple
    processing units neurons.
  • Inputs are received on dendrites, and if the
    input levels are over a threshold, the neuron
    fires, passing a signal through the axon to the
    synapse which then connects to another neuron.

5
Neural Network Representation
  • ALVINN uses a learned ANN to steer an autonomous
    vehicle driving at normal speeds on public
    highways
  • Input to network 30x32 grid of pixel intensities
    obtained from a forward-pointed camera mounted on
    the vehicle
  • Output direction in which the vehicle is steered
  • Trained to mimic observed steering commands of a
    human driving the vehicle for approximately 5
    minutes

6
ALVINN
7
Appropriate problems
  • ANN learning well-suit to problems which the
    training data corresponds to noisy, complex data
    (inputs from cameras or microphones)
  • Can also be used for problems with symbolic
    representations
  • Most appropriate for problems where
  • Instances have many attribute-value pairs
  • Target function output may be discrete-valued,
    real-valued, or a vector of several real- or
    discrete-valued attributes
  • Training examples may contain errors
  • Long training times are acceptable
  • Fast evaluation of the learned target function
    may be required
  • The ability for humans to understand the learned
    target function is not important

8
Artificial Neurons (1)
  • Artificial neurons are based on biological
    neurons.
  • Each neuron in the network receives one or more
    inputs.
  • An activation function is applied to the inputs,
    which determines the output of the neuron the
    activation level.
  • The charts on the right show three typical
    activation functions.

9
Artificial Neurons (2)
  • A typical activation function works as follows
  • Each node i has a weight, wi associated with it.
    The input to node i is xi.
  • t is the threshold.
  • So if the weighted sum of the inputs to the
    neuron is above the threshold, then the neuron
    fires.

10
Perceptrons
  • A perceptron is a single neuron that classifies a
    set of inputs into one of two categories (usually
    1 or -1).
  • If the inputs are in the form of a grid, a
    perceptron can be used to recognize visual images
    of shapes.
  • The perceptron usually uses a step function,
    which returns 1 if the weighted sum of inputs
    exceeds a threshold, and 0 otherwise.

11
Training Perceptrons
  • Learning involves choosing values for the weights
  • The perceptron is trained as follows
  • First, inputs are given random weights (usually
    between 0.5 and 0.5).
  • An item of training data is presented. If the
    perceptron mis-classifies it, the weights are
    modified according to the following
  • where t is the target output for the training
    example, o is the output generated by the
    preceptron and a is the learning rate, between 0
    and 1 (usually small such as 0.1)
  • Cycle through training examples until
    successfully classify all examples
  • Each cycle known as an epoch

12
Bias of Perceptrons
  • Perceptrons can only classify linearly separable
    functions.
  • The first of the following graphs shows a
    linearly separable function (OR).
  • The second is not linearly separable
    (Exclusive-OR).

13
Convergence
  • Perceptron training rule only converges when
    training examples are linearly separable and a
    has a small learning constant
  • Another approach uses the delta rule and gradient
    descent
  • Same basic rule for finding update value
  • Changes
  • Do not incorporate the threshold in the output
    value (unthresholded perceptron)
  • Wait to update weight until cycle is complete
  • Converges asymptotically toward the minimum error
    hypothesis, possibly requiring unbounded time,
    but converges regardless of whether the training
    data are linearly separable

14
Multilayer Neural Networks
  • Multilayer neural networks can classify a range
    of functions, including non linearly separable
    ones.
  • Each input layer neuron connects to all neurons
    in the hidden layer.
  • The neurons in the hidden layer connect to all
    neurons in the output layer.
  • A feed-forward network

15
Speech Recognition ANN
16
Sigmoid Unit
  • ?(x) is the sigmoid function
  • Nice property differentiable
  • Derive gradient descent rules to train
  • One sigmoid unit - node
  • Multilayer networks of sigmoid units

17
Backpropagation
  • Multilayer neural networks learn in the same way
    as perceptrons.
  • However, there are many more weights, and it is
    important to assign credit (or blame) correctly
    when changing weights.
  • E sums the errors over all of the network output
    units

18
Backpropagation Algorithm
  • Create a feed-forward network with nin inputs,
    nhidden hidden units, and nout output units.
  • Initialize all network weights to small random
    numbers
  • Until termination condition is met, Do
  • For each ltx,tgt in training examples, Do
  • Propagate the input forward through the network
  • Input the instance x to the network and compute
    the output ou of every unit u in the network
  • Propagate the errors backward through the
    network
  • For each network output unit k, calculate its
    error term dk
  • For each hidden unit h, calculate its error term
    dh
  • Update each network weight wji
  • where

19
Example Learning AND
a
b
c
Initial Weights w_da .2 w_db .1 w_dc
-.1 w_d0 .1 w_ea -.5 w_eb .3 w_ec
-.2 w_e0 0 w_fd .4 w_fe -.2 w_f0 -.1
d
e
f
Training Data AND(1,0,1) 0 AND(1,1,1)
1 Alpha 0.1
20
Hidden Layer representation
Target Function
Can this be learned?
21
Yes
22
Plots of Squared Error
23
Hidden Unit
(.15 .99 .99)
24
Evolving weights
25
Momentum
  • One of many variations
  • Modify the update rule by making the weight
    update on the nth iteration depend partially on
    the update that occurred in the (n-1)th iteration
  • Minimizes error over training examples
  • Speeds up training since it can take 1000s of
    iterations

26
When to stop training
  • Continue until error falls below some predefined
    threshold
  • Bad choice because Backpropagation is susceptible
    to overfitting
  • Won't be able to generalize as well over unseen
    data

27
Cross Validation
  • Common approach to avoid overfitting
  • Reserve part of the training data for testing
  • m examples are partitioned into k disjoint
    subsets
  • Run the procedure k times
  • Each time a different one of these subsets is
    used as validation
  • Determine the number of iterations that yield the
    best performance
  • Mean of the number of iterations is used to train
    all n examples

28
Neural Nets for Face Recognition
29
Hidden Unit Weights
left
straight
right
up
30
Error gradient for the sigmoid function
31
Error gradient for the sigmoid function
Write a Comment
User Comments (0)
About PowerShow.com