Unsupervised Learning - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Unsupervised Learning

Description:

Reserving 'dead units' as resources helps prepare the network for handling ... The product can be thought of as an elastic net that covers the input space ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 26
Provided by: michae1249
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Learning


1
Unsupervised Learning
  • i.e., learning even when there
  • is no right answer

2
Unsupervised Learning
  • Subject What does unsupervised learning learn?
  • Unsupervised learning allegedly involves no
    target values.
  • In fact, for most varieties of unsupervised
    learning, the targets are the same as the inputs
    (Sarle, 1994).
  • In other words, unsupervised learning usually
    performs the same task as an auto-associative
    network, compressing the information from the
    inputs (Deco and Obradovic, 1996).

3
Examples of unsupervised learning
  • Cluster analysis
  • Identifying the relational structure of the
    world.
  • Accomplished via competitive learning
  • Correlational analysis
  • Identifying the correlations among features.
  • Accomplished via Hebbian learning

4
Cluster analysis
  • Competitive learning
  • Unsupervised competitive learning is used in a
    wide variety of fields under a wide variety of
    names, the most common of which is "cluster
    analysis"
  • Analogous to k-means clustering
  • But, competitive learning is iterative and can be
    non-linear
  • The goal in these cases is to classify the input
    vectors into groups thus providing a summary of
    the inputs efficient categorization.

5
Competitive learning
  • Create a number of arbitrarily defined cluster
    vectors (the k-clusters of k-means clustering)
  • Change the location of these vectors to match the
    distribution of the input vectors

6
Moving cluster vectors to cluster centers
7
Graphical depiction
From Dr. Nigel Allinsons web page   
8
Computational details
  • Simple competitive learning
  • Single layer of output units, output, each fully
    connected to the input units
  • The number of output units is determined by the
    designer and critically important to network
    behavior.
  • Activity of each output unit is determined in the
    traditional way
  • output W.input.
  • The activity of the output units is a measure of
    the similarity between their weight vectors and
    the input vector.
  • Similarity traditionally measured using the dot
    product.

9
How is learning accomplished?
  • Winner-take-all (WTA)
  • The output unit with the highest activation level
    changes its connections to the input layer
  • Learning consists of making the weight vector
    more similar to the input vector (wi input)
    should decrease.
  • Eq ?wi ?(input wi) where wi is the weight
    vector for the winner.
  • Geometric analogy
  • Draw on board
  • Finds the middle of clusters in the input

10
Number of clusters (i.e. of output units)
  • The number of units determines how finely the
    input representations are divided
  • In k-means cluster analysis, k specifies number
    of clusters.
  • Dead units
  • Some units may be perennial losers how to
    rectify?
  • Initialize the weight vectors to samples from the
    input itself to insure they are where the action
    is
  • Update the weights of losers as well as winners,
    but losers change weights much more slowly
  • Leaky learning
  • Dead units gravitate toward where the action is
  • But, in some cases we may want dead units
  • Reserving dead units as resources helps prepare
    the network for handling retroactive interference.

11
Variations on simple competitive learning
  • Vector quantization
  • Used for data compression and is used for both
    storage and transmission of speech and image
    data.
  • Idea Categorize input vectors into M classes
    (for M output units) and then represent each
    vector by the category in which it falls.
  • This basic technique uses standard competitive
    learning.
  • Learning vector quantization (Kohonen, 1989) aka
    LVQ

12
Supervised version of vector quantization
  • We have a body of labeled sample data (labeled
    with its correct, predefined class).
  • In the first case, the learning rule is the
    standard competitive learning rule, but in the
    second case, the weight vector and the input
    vector are moved in OPPOSITE directions
  • Minimizes the number of misclassifications
  • LVQ2 (Kohonen, 1989)

13
Multi-layer networks
  • Can do hierarchical clustering
  • Results of first level of cluster analysis feeds
    into the next layer.
  • Each layer should extract higher orders of
    clustering information

14
Kohonens self-organizing (feature) map
  • The idea behind these networks is to preserve the
    topology (spatial arrangement) of the input
    vector.
  • Each output unit is no longer independent of the
    others
  • Neighboring output units should represent similar
    input vectors
  • ?wi ? f(i,i) (input-wi)
  • where f is a neighborhood function and i is
    winning unit
  • f is 1 when ii and falls off toward 0.0 as the
    distance between is and is weight vectors
    increases.

15
(No Transcript)
16
Typical neighborhood function
  • Low values of s produce a small neighborhood,
    small values produce a large neighborhood
  • ? usually starts out high and decreases during
    training to optimize learning.
  • The product can be thought of as an elastic net
    that covers the input space

17
Uses of Kohonen nets
  • These types of nets are most often used as
    accounts of the development of topological maps
    in the brain.

18
Uses of competitive learning for cluster analysis
  • Obviously classification
  • Used to model unsupervised learning in people.
  • Unsupervised learning most commonly studied as a
    function of development.
  • Preprocessing of inputs before supervised (Hybrid
    learning)
  • Removes redundancy in inputs
  • Produces outputs that are less correlated (even
    orthogonal), thus reducing future interference.
  • Makes for improved efficiency in learning.

19
Finding correlational structure
  • Hebbian learning
  • Hebbian learning is the other most common variety
    of unsupervised learning.
  • The goal is to identify the regularities or
    correlations in the input patterns to identify
    redundancy.

20
Applications
  • Some possible applications of a system that could
    detect correlations/redundancies
  • Familiarity a single continuous-valued output
    could tell us how similar a new pattern is to
    typical or average patterns seen in the past.
    Network learns what is typical.
  • Principal component analysis extends the
    familiarity case to several units that together
    produce a multi-component measure of similarity
    to past patterns.
  • Encoding reduction of a pattern to a
    less-redundant code. Helpful if information must
    be transmitted over a limited-capacity channel.

21
Error function
  • Hebbian learning minimizes the same error
    function as an auto-associative network with a
    linear hidden layer.
  • Therefore, it is therefore a form of
    dimensionality reduction.
  • This error function is minimized by identifying
    the leading principal components.
  • There are variations of Hebbian learning that
    explicitly produce the principal components.

22
Computational details
  • 1-unit (first principal component)
  • Ojas rule
  • This is just the autoassociation delta rule.

23
M-unit (other principal components)
  • Sangers learning rule
  • Ojas M-unit rule

24
Sangers vs. Ojas rule
  • Rules differ only in the limits for the
    summation.
  • For Sangers rule, the weight vectors converge on
    the M principal components, in order.
  • For Ojas M-unit rule, the weight vectors span
    the same subspace as those components but dont
    find the components themselves.
  • Sangers is more useful for practical
    applications, but Ojas is more likely to be used
    by real brains.

25
Competition between networks, not just units
  • Jacobs, Jordan, Nowlan Hinton
  • For discussion on Thursday.
  • Also, well discuss Sarle through unsupervised
    learning section on p. 7
  • Discussion of Hybrid networks and following next
    week.
Write a Comment
User Comments (0)
About PowerShow.com