Data Mining CSE5230

1 / 36
About This Presentation
Title:

Data Mining CSE5230

Description:

... neighbourhood-preserving organization of the cortex is called a topographic feature map ... Moreover, it results in the formation of topographic feature maps: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Data Mining CSE5230


1
Data Mining - CSE5230
CSE5230/DMS/2003/6
  • Neural Networks 2
  • Self-Organizing Maps (SOMs)

2
Lecture Outline
  • Motivation
  • unsupervised learning
  • the cortex
  • topographic feature maps
  • biological self-organizing maps
  • Artificial self-organizing maps
  • Kohonens self-organizing network
  • learning algorithm
  • examples
  • Data mining examples
  • Text mining
  • Customer Understanding

3
Lecture Objectives
  • By the end of this lecture you should be able to
  • Explain the principal differences between MLPs
    and SOMs
  • Describe the properties of a topological feature
    map, with particular attention to the notion of
    similarity in feature space being mapped to
    proximity in the SOM
  • Describe how Kohonen networks are trained
  • Give examples of how SOMs can be used in data
    mining

4
Motivation - 1
  • The feed-forward back-propagation NNs discussed
    last week are an example of a supervised learning
    technique
  • In supervised learning, the aim is to discover a
    relationship between the inputs and outputs of a
    system
  • This relationship can be used for tasks such as
    prediction, estimation or classification
  • A known training set of input/output pairs is
    used to train the network

5
Motivation - Unsupervised Learning
  • Many data mining tasks are not suited to this
    approach
  • Often the data mining task is to discover
    structure in the data set, without any prior
    knowledge of what is there
  • This is an example of unsupervised learning (we
    have already seen the example of the K-means
    clustering algorithm)
  • A class of neural networks called Self-Organizing
    Maps (SOMs) can be used for this task

6
The Cortex - 1
  • SOMs research was inspired by the observation of
    topologically correct sensory maps in the cortex
    (e.g. the retinotopic, somatotopic, tonotopic
    maps)
  • In humans, the cortex consists of a layer of
    nerve tissue about 0.2m2 in area and 2-3mm in
    thickness
  • It is highly convoluted to save space, and forms
    the exterior of the brain - its the folded,
    wrinkled stuff we see when we look at a brain

7
The Cortex - 2
Lateral (schematic) view of the human left-brain
hemisphere. Various cortical areas devoted to
specialized tasks can be distinguished RMS1992,
p. 18
8
Sensory Surfaces
  • Most signals that the brain receives from the
    environment come from sensory surfaces covered
    with receptors
  • skin (touch and temperature)
  • retina (vision)
  • cochlea in the ear (1-D sound sensor)
  • It is usually found that the wiring of the
    nervous system exhibits topographic ordering
  • signals from adjacent receptors tend to be
    conducted to adjacent neurons in the cortex

9
Topographic Feature Maps - 1
  • This neighbourhood-preserving organization of the
    cortex is called a topographic feature map
  • For touch, maps of the body are found in the
    somatosensory cortex
  • In the primary visual cortex, neighbouring
    neurons tend to respond to stimulation of
    neighbouring regions of the retina
  • As well as these simple maps, the brain also
    constructs topographic maps of abstract features
  • In the auditory cortex of many higher brains, a
    tonotopic map is found, where the pitch of
    received sounds is mapped regularly

10
Topographic Feature Maps - 2
Map of part of the body surface in the
somatosensory cortex of a monkey
Direction map for sound signals in the so-called
optical tectum of an owl RMS1992, p.
21
11
Biological Self-Organizing Maps - 1
  • The subject of SOMs arose from the question of
    how such topology-preserving mappings might
    arise in neural networks
  • It is probable that in biological systems that
    much of the organization of such maps is
    genetically determined, BUT
  • The brain is estimated to have 1013 synapses
    (connections), so it would be impossible to
    produce this organization by specifying each
    connection in detail the genome does not
    contain that much information

12
Biological Self-Organizing Maps - 2
  • A more likely scenario is that there are
    genetically specified mechanisms of structure
    formation that result in the creation of the
    desired connectivity
  • These could operate before birth, or as part of
    later maturation, involving interaction with the
    environment
  • There is much evidence for such changes
  • the normal development of edge-detectors in the
    visual cortex of newborn kittens is suppressed in
    the absence of sufficient visual experience
  • the somatosensory maps of adult monkeys have been
    observed to adapt following the amputation of a
    finger

13
Biological Self-Organizing Maps - 3
Readaptation of the somatosensory map of the hand
region of an adult nocturnal ape due to the
amputation of one finger. Several weeks after the
amputation of the middle finger (3), the assigned
region has disappeared and the adjacent regions
have spread out. RMS, p. 117
14
Artificial Self-Organizing Maps - 1
  • In the NN models we have seen so far, every
    neuron in a layer is connected to every neuron in
    the next layer of the network
  • The location of a neuron in a layer plays no role
    in determining its connectivity or weights
  • With SOMs, the ordering of neurons within a layer
    plays an important role
  • How should the neurons organize their
    connectivity to optimize the spatial distribution
    of their responses within the layer?

15
Artificial Self-Organizing Maps - 2
  • The purpose of this optimization is to achieve
    the mapping
  • Such a mapping allows neurons with similar tasks
    to communicate over especially short connection
    paths - important for a massively parallel system
  • Moreover, it results in the formation of
    topographic feature maps
  • most important similarity relationships among the
    input signals are converted into spatial
    relationships between responding neurons

Similarity of features
Proximity of excited neurons
16
Kohonens Self-Organizing Network - 1
  • Kohonen Koh1982 studied a system consisting of
    a two-dimensional layer of neurons, with the
    properties
  • each neuron identified by its position vector r
    (i.e. its coordinates)
  • input signals to the layer represented by a
    feature vector x (usually normalized)
  • output of each neuron is a sigmoidal function of
    its total activation (as for MLPs last week)

17
Kohonens Self-Organizing Network - 2
  • Each neuron r forms the weighted sum of the input
    signals. The external activation is(the
    magnitudes of the weight vectors are usually
    normalized)
  • In addition to the input connections, the neurons
    in the layer are connected to each other
  • the layer has internal feedback
  • The weight from neuron r to neuron r is labelled
    grr
  • These lateral inputs are superimposed on the
    external input signal

18
Kohonens Self-Organizing Network - 3
  • The output of neuron r is this given by
  • The neuron activities are the solutions of this
    system of non-linear equations

The feedback due to the lateral connections grr
is usually arranged so that it is excitatory at
small distances and inhibitory at large
distances. This is often called a Mexican Hat
response
19
Kohonens Self-Organizing Network - 4
Kohonens model showing excitation zone around
winning neuron RMS p. 64
  • The solution of such systems of non-linear
    equations is tedious and time-consuming. Kohonen
    avoided this by introducing a simplification.

20
Kohonens Self-Organizing Network - 5
  • The response of the network is assumed to always
    be the same shape
  • the response is 1 at the location of the neuron
    r receiving maximal external excitation, and
    decreases to 0 as one moves away from r
  • The excitation of neuron r is thus only a
    function of its distance from r
  • The model then proposes a rule for changing the
    weights to each neuron so that a topologically
    ordered map is formed. Weight change is

21
Kohonens Self-Organizing Network - 6
  • Experiments have shown that the precise shape of
    the response is not critical
  • A suitable function is thus simply chosen. The
    Gaussian is a suitable choice
  • The parameter s determines the length scale on
    which input stimuli cause changes in the map
  • usually learn coarse structure first and then the
    fine structure. This is done by letting s
    decrease over time
  • e on the previous slide, which specifies the size
    of each change, usually also decreases over time

22
Learning Algorithm
  • 0. Initialization start with appropriate initial
    values for the weights wr. Usually just random
  • 1. Choice of stimulus Choose an input vector x
    at random from the data set
  • 2. Response Determine the winning neuron r
    most strongly activated by x
  • 3. Adaptation Carry out a learning step by
    modifying the weights(Normalize weights if
    required)
  • 4. Continue with step 1 until specified number of
    learning steps are completed

23
Examples - 1
SOM that has learnt data uniformly distributed on
a square
SOM that has learnt data on a rotated square,
where points are twice as likely to occur in a
circle at the centre of the square (relationship
to clustering)
24
Examples - 2
2-dimensional SOM that has learnt data uniformly
distributed in a 3-dimensional cube
25
Examples - 3
1-dimensional SOM that has learnt data uniformly
distributed in a 2-dimensional circle
26
Examples - 4
2-dimensional SOM that has learnt 2-dimensional
data containing 3 clusters
27
The SOM for Data Mining
  • The SOM is a good method for obtaining an initial
    understanding of a set of data about which the
    analyst does not have any opinion (e.g. no need
    to estimate number of clusters)
  • The map can be used as an initial unbiased
    starting point for further analysis. Once the
    clusters are selected from the map, they are
    analyzed to find out the reasons for such
    clustering
  • It may be possible to determine which attributes
    were responsible for the clusters
  • It may also be possible to identify some
    attributes which do not contribute to the
    clustering

28
Example Text Mining with a SOM - 1
  • This example comes from the WEBSOM project in
    Finland http//websom.hut.fi/websom/
  • WEBSOM is a method for organizing miscellaneous
    text documents onto meaningful maps for
    exploration and search. WEBSOM automatically
    organizes the documents onto a two-dimensional
    grid so that related documents appear close to
    each other

29
Example Text Mining with a SOM - 2
  • This map was constructed using more than one
    million documents from 83 USENET newsgroups
  • Color denotes the density or the clustering
    tendency of the documents
  • Light (yellow) areas are clusters and dark (red)
    areas empty space between the clusters
  • This is a little difficult to read, but WEBSOM
    allows one to zoom in

30
Example Text Mining with a SOM - 3
  • Zoomed view of the WEBSOM map

blues - rec.music.bluenotebooks - rec.arts.books

classical - rec.music.classical
humor - rec.humor

lang.dylan - comp.lang.dylan
music - music

shostakovich - alt.fan.shostakovich
31
Example Customer Understanding with a SOM - 1
  • This example is from YaZ2001, using KDD 2000
    Cup data
  • clickstream and purchase data from Gazelle.com, a
    retailer of legware and legcare products
  • On-line retailers are interested in understanding
    their customers, so that they can
  • Better organize the website
  • Better target marketing
  • Improve strategies for acquiring and retaining
    customers
  • Gazelle.com was interested in analysing the
    differences between light (? 12) and heavy
    spenders (? 12)

32
Example Customer Understanding with a SOM - 2
  • Data set and Feature Selection
  • Data set has more than 1700 records, each with
    426 features and a variable indicating light or
    heavy spending. Features include
  • age (discrete)
  • income band (ordered), e.g.
  • lt 15,000, 15,000-19,999, 20,000-29,999,
  • percentage of discounted items in purchase
    (continous)
  • YaZ2001 compared a variety of methods for
    generating a reduced feature set. These were
    adapted from criteria used in other DM
    techniques
  • Discriminant analysis, decision tree, naĂŻve
    Bayes, Principal Components Analysis (PCA)
  • The different methods highlighted a variety of
    features, e.g.
  • discount rate, average and total weight of items,
    minimum shipping order amount, geographic
    location, house value, vendor, main template
    views, etc.

33
Example Customer Understanding with a SOM - 3
  • YaZ2001 selected the eight variables indicated
    by discriminant analysis

Projection onto the first two components provided
by PCA of these data did not show clear
separation into two clusters x heavy
spender o light spender This could indicate the
presence of a non-linear relationship
34
Example Customer Understanding with a SOM - 4
  • Then applied a modified self-organizing map,
    called a Generative Topographic Mapping (GTM) to
    produce another 2-D visualization of the data

Separation of classes into seven clusters now
much better 1 heavy 88, light 122 heaving
93, light 73 heavy 1004 light 1005
light 94, heavy 66 light 93, heavy 77
light 97, heavy 3
35
Example Customer Understanding with a SOM - 5
  • Analysis of the features corresponding to these
    clusters reveals facts such as
  • Cluster 4 (100 light) are those customers with
    more than 40 discounted items in their purchases
  • Clusters 1-3 Those who heard about the company
    from friend/family are light spenders but
    those who heard from a means other than news,
    e-mail, print ad, direct mail, or friend/family
    were heavy spenders
  • Clusters 6-7 people who frequently wear casual
    or athletic socks are light spenders
  • Insights such as these could be used for managing
    marketing, and also pricing policies (e.g.
    discounts)

36
References
  • Koh1982 Teuvo Kohonen, Self-organized formation
    of topologically correct feature maps, Biological
    Cybernetics, 4359-69, 1982
  • RMS1992 Helge Ritter, Thomas Martinetz and
    Klaus Schulten, Neural computation and
    self-organizing maps an introduction,
    Addison-Wesley, 1992
  • YaZ2001 Jinsan Yang and Byoung-Tak Zhang,
    Customer Data Mining and Visualization by
    Generative Topographic Mapping Methods, In Simeon
    J. Simoff, Monique Noirhomme-Fraiture and Michael
    H. Böhlen eds., Proceedings of the International
    Workshop on Visual Data Mining (VDM_at_ECML/PKDD2001)
    , Freiburg, Germany, pp. 55-66, 4 September 2001
Write a Comment
User Comments (0)