Unsupervised Learning - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Unsupervised Learning

Description:

Reserving 'dead units' as resources helps prepare the network for handling ... The product can be thought of as an elastic net that covers the input space ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 26

Provided by: michae1249

Category:

more less

Transcript and Presenter's Notes

Title: Unsupervised Learning

1
Unsupervised Learning

i.e., learning even when there
is no right answer

2
Unsupervised Learning

Subject What does unsupervised learning learn?
Unsupervised learning allegedly involves no
target values.
In fact, for most varieties of unsupervised
learning, the targets are the same as the inputs
(Sarle, 1994).
In other words, unsupervised learning usually
performs the same task as an auto-associative
network, compressing the information from the
inputs (Deco and Obradovic, 1996).

3
Examples of unsupervised learning

Cluster analysis
Identifying the relational structure of the
world.
Accomplished via competitive learning
Correlational analysis
Identifying the correlations among features.
Accomplished via Hebbian learning

4
Cluster analysis

Competitive learning
Unsupervised competitive learning is used in a
wide variety of fields under a wide variety of
names, the most common of which is "cluster
analysis"
Analogous to k-means clustering
But, competitive learning is iterative and can be
non-linear
The goal in these cases is to classify the input
vectors into groups thus providing a summary of
the inputs efficient categorization.

5
Competitive learning

Create a number of arbitrarily defined cluster
vectors (the k-clusters of k-means clustering)
Change the location of these vectors to match the
distribution of the input vectors

6
Moving cluster vectors to cluster centers
7
Graphical depiction
From Dr. Nigel Allinsons web page
8
Computational details

Simple competitive learning
Single layer of output units, output, each fully
connected to the input units
The number of output units is determined by the
designer and critically important to network
behavior.
Activity of each output unit is determined in the
traditional way
output W.input.
The activity of the output units is a measure of
the similarity between their weight vectors and
the input vector.
Similarity traditionally measured using the dot
product.

9
How is learning accomplished?

Winner-take-all (WTA)
The output unit with the highest activation level
changes its connections to the input layer
Learning consists of making the weight vector
more similar to the input vector (wi input)
should decrease.
Eq ?wi ?(input wi) where wi is the weight
vector for the winner.
Geometric analogy
Draw on board
Finds the middle of clusters in the input

10
Number of clusters (i.e. of output units)

The number of units determines how finely the
input representations are divided
In k-means cluster analysis, k specifies number
of clusters.
Dead units
Some units may be perennial losers how to
rectify?
Initialize the weight vectors to samples from the
input itself to insure they are where the action
is
Update the weights of losers as well as winners,
but losers change weights much more slowly
Leaky learning
Dead units gravitate toward where the action is
But, in some cases we may want dead units
Reserving dead units as resources helps prepare
the network for handling retroactive interference.

11
Variations on simple competitive learning

Vector quantization
Used for data compression and is used for both
storage and transmission of speech and image
data.
Idea Categorize input vectors into M classes
(for M output units) and then represent each
vector by the category in which it falls.
This basic technique uses standard competitive
learning.
Learning vector quantization (Kohonen, 1989) aka
LVQ

12
Supervised version of vector quantization

We have a body of labeled sample data (labeled
with its correct, predefined class).
In the first case, the learning rule is the
standard competitive learning rule, but in the
second case, the weight vector and the input
vector are moved in OPPOSITE directions
Minimizes the number of misclassifications
LVQ2 (Kohonen, 1989)

13
Multi-layer networks

Can do hierarchical clustering
Results of first level of cluster analysis feeds
into the next layer.
Each layer should extract higher orders of
clustering information

14
Kohonens self-organizing (feature) map

The idea behind these networks is to preserve the
topology (spatial arrangement) of the input
vector.
Each output unit is no longer independent of the
others
Neighboring output units should represent similar
input vectors
?wi ? f(i,i) (input-wi)
where f is a neighborhood function and i is
winning unit
f is 1 when ii and falls off toward 0.0 as the
distance between is and is weight vectors
increases.

15
(No Transcript)
16
Typical neighborhood function

Low values of s produce a small neighborhood,
small values produce a large neighborhood
? usually starts out high and decreases during
training to optimize learning.
The product can be thought of as an elastic net
that covers the input space

17
Uses of Kohonen nets

These types of nets are most often used as
accounts of the development of topological maps
in the brain.

18
Uses of competitive learning for cluster analysis

Obviously classification
Used to model unsupervised learning in people.
Unsupervised learning most commonly studied as a
function of development.
Preprocessing of inputs before supervised (Hybrid
learning)
Removes redundancy in inputs
Produces outputs that are less correlated (even
orthogonal), thus reducing future interference.
Makes for improved efficiency in learning.

19
Finding correlational structure

Hebbian learning
Hebbian learning is the other most common variety
of unsupervised learning.
The goal is to identify the regularities or
correlations in the input patterns to identify
redundancy.

20
Applications

Some possible applications of a system that could
detect correlations/redundancies
Familiarity a single continuous-valued output
could tell us how similar a new pattern is to
typical or average patterns seen in the past.
Network learns what is typical.
Principal component analysis extends the
familiarity case to several units that together
produce a multi-component measure of similarity
to past patterns.
Encoding reduction of a pattern to a
less-redundant code. Helpful if information must
be transmitted over a limited-capacity channel.

21
Error function

Hebbian learning minimizes the same error
function as an auto-associative network with a
linear hidden layer.
Therefore, it is therefore a form of
dimensionality reduction.
This error function is minimized by identifying
the leading principal components.
There are variations of Hebbian learning that
explicitly produce the principal components.

22
Computational details

1-unit (first principal component)
Ojas rule
This is just the autoassociation delta rule.

23
M-unit (other principal components)

Sangers learning rule
Ojas M-unit rule

24
Sangers vs. Ojas rule

Rules differ only in the limits for the
summation.
For Sangers rule, the weight vectors converge on
the M principal components, in order.
For Ojas M-unit rule, the weight vectors span
the same subspace as those components but dont
find the components themselves.
Sangers is more useful for practical
applications, but Ojas is more likely to be used
by real brains.

25
Competition between networks, not just units