Growing Hierarchical Self-Organizing Maps for Web Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Growing Hierarchical Self-Organizing Maps for Web Mining

Description:

... W2,10 are Web documents pertaining to Global, Local, Political, ... Articles in W2,10 relate to Health News. W3,10,1 relates to Health Research Funding. ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 18
Provided by: josephh4
Category:

less

Transcript and Presenter's Notes

Title: Growing Hierarchical Self-Organizing Maps for Web Mining


1
Growing Hierarchical Self-Organizing Maps for Web
Mining
  • Joseph P. Herbert and JingTao Yao
  • Department of Computer Science,
  • University or Regina
  • CANADA S4S 0A2
  • herbertj_at_cs.uregina.ca jtyao_at_cs.uregina.ca
  • http//www2.cs.uregina.ca/herbertj http//www2.cs
    .uregina.ca/jtyao

2
Introduction
  • Many information retrieval and machine learning
    techniques have not evolved to survive the Web
    environment.
  • There are two major problems in applying some
    machine learning techniques for Web Mining
  • The dynamic and ever-changing nature of Web data.
  • Dimensionality and sheer size of Web data.

3
Introduction
  • Three domains of application Web content mining,
    Web usage mining, and Web structure mining.
  • Self-Organizing Maps (SOM) have been used for
  • Web page clustering
  • Document retrieval
  • Recommendation systems

4
Growing Hierarchical SOMs
  • Growing Hierarchical SOMs are a hybridization of
    Growing SOMs and Hierarchical SOMs
  • Growing SOMs have a dynamic topology of neurons
    to help solve the dynamic nature of data on the
    Web.
  • Hierarchical SOMs are multi-level systems
    designed to minimize the high dimensionality
    problem of data.
  • Together, the hybrid system provides a logical
    solution when considering the combined problem of
    dynamic, high-dimensional data sources.

5
The Consistency Problem
  • The growing hierarchical SOM model suffers from a
    new problem
  • Maintaining consistency of hierarchical
    relationships between levels.
  • Training is done locally, without consideration
    of how changes effect other SOMs that have
    connection to the local focus.
  • The Web Mining model for Self-Organizing Maps
    solve this problem through bidirectional update
    propagation.

6
The Web Mining Model for Self-Organizing Maps


C




U




w







  • Input Layer
  • Each vector is inserted into the SOM network for
    the first
  • stage of competition.
  • An iteration is complete once all input vectors
    have been
  • presented.
  • Hierarchy Layer
  • A suitable level within the hierarchy of SOMs is
    found
  • by traversing the tree.
  • The SOM whose collectively maximum similarity to
  • the input is marked and passed to the next
    layer.
  • Growth Layer
  • This layer determines whether or not neurons
    need to be
  • added or subtracted from the current SOM.
  • If error is above an upper bound threshold,
    neurons are
  • added. If error is below a lower bound
    threshold, neurons
  • are removed.
  • Update Layer
  • This layer updates the winning neuron and the
  • neighborhood associated with it.
  • Bidirectional Update Propagation updates parent
    neurons
  • and children feature maps that are associated
    with the
  • winning neuron.

7
Formal Definition
  • A A1, , At
  • A set of hierarchy levels.
  • Ai Wi,1, , Wi,m
  • A set of individual SOMs.
  • Wi,j w1, , wn
  • A SOM of n neurons.
  • Each neuron contains a storage unit sk and a
    weight vector vk.

8
Three Basic Functions
  • Three functions are introduced for actions on the
    system
  • Lev()
  • Returns the hierarchy level that a SOM currently
    resides on.
  • Chd()
  • Returns a set of SOMs that have child
    relationship to a particular neuron.
  • Par()
  • Returns the parent SOM of a particular neuron.

9
Process Flow for Training
  1. Input is inserted into network
  2. Neuron that is most similar is selected.
  3. Descend through hierarchy until similarity is
    maximal.
  4. Determine whether correct number of neurons
    represent pattern.
  5. Add / Subtract neurons accordingly.
  6. Update neuron and neighbourhood.
  7. Update children SOMs.
  8. Update parent SOM.

1
Input
Determine winning neuron on current level
Bidirectional Propagation
2
Propagate Updates Upwards
8
Is neuron and input similar enough ?
3
N
Y
Propagate Updates Downwards
7
4
Proceed to next Hierarchy Level with closest
neuron
Is map representing input enough ?
Y
N
5
Add / Subtract Neuron
Update Winner Neuron
6
Update Neighborhood
10
Conceptual View
  • At the top-most hierarchy level (A1), only one
    feature map would exist.
  • This map contains the absolute highest conceptual
    view of the entire hierarchical structure.
  • Additional SOMs on subsequent levels offer more
    precise pattern abstraction.
  • SOMs are denoted by the sequence of their
    parents.
  • W3,6,4 denotes the feature map is the fourth map
    on the third level derived from the sixth map on
    the previous level.

11
Learning of Features
  • Once a winning neuron wihas been identified
    (denoted by an asterisk), its weight vector vi
    is updated according to a learning rate a.
  • The value a decays over time according to the
    current training iteration.
  • vi(q) vi(q-1) a(pk(q) vi(q-1))
  • The neighbourhood must also be updated with a
    modified learning rate a/.
  • vNi(d)(q) vNi(d) (q-1) a(pk(q)
    vNi(d)(q-1))

12
Bidirectional Update Propagation
  • Let wi be the winning neuron in SOM Wj,k for
    input k.
  • To propagate upwards
  • Calculate Par(wi) Wj-1,m, where Lev(Wj-1,m) lt
    Lev(Wj,k).
  • Update all neurons wa contained in Wj-1,m that
    are similar to wi.
  • va(q) va(q-1) ß(pk(q) va(q-1))

13
Bidirectional Update Propagation
  • To propagate downwards
  • Calculate Chd(wi) Aj1, where j1 is the next
    level in the hierarchy succeeding level j.
  • Update the corresponding weight vectors for all
    neurons wb in SOM Wj1,t, where Wj1,t is on the
    lower level Aj1.
  • vb(q) vb(q-1) ?(pk(q) vb(q-1))
  • The learning rates ß and ? are derived from a
    value of a.
  • Generally, updates to a parent neuron are not as
    strong as updates to children neurons.

14
Web-based News Coverage Example
  • The top-most level of the hierarchy contains news
    articles pertaining to high-level concepts and
    are arranged according to their features.
  • The entire collection of Web documents on the
    online news site are presented through feature
    maps that abstract their similarities.
  • Individual maps W2,1 , , W2,10 are Web documents
    pertaining to Global, Local, Political, Business,
    Weather, Entertainment, Technology, Sports,
    Opinion, and Health news respectively.

15
Web-based News Coverage Example
  • Feature map W2,10 with neurons linking to three
    children maps W3,10,1 , W3,10,2 , W3,10,3.
  • Articles in W2,10 relate to Health News.
  • W3,10,1 relates to Health Research Funding.
  • W3,10,2 relates to Health Outbreak Crises.
  • W3,10,3 relates to Doctor shortages.
  • New Health-related articles are coming in rapidly
    relating to a recent international outbreak.
  • Neurons are added to W2,10 in the Health Outbreak
    Crises cluster, that point to the SOM W3,10,2.

16
Conclusion
  • The Web mining model of growing hierarchical
    self-organizing maps minimizes the effect of the
    dynamic data and high-dimensionality problems.
  • Bidirection Update Propagation allows for changes
    in pattern abstractions to be reflect on multiple
    levels in the hierarchy.
  • The Web-based News Coverage example demonstrates
    the effectiveness of growing hierarchical
    self-organizing maps when used in conjunction
    with bidirectional update propagation.

17
Growing Hierarchical Self-Organizing Maps for Web
Mining
Thank-you
Thank-you
  • Joseph P. Herbert and JingTao Yao
  • Department of Computer Science,
  • University or Regina
  • CANADA S4S 0A2
  • herbertj_at_cs.uregina.ca jtyao_at_cs.uregina.ca
  • http//www2.cs.uregina.ca/herbertj http//www2.cs
    .uregina.ca/jtyao
Write a Comment
User Comments (0)
About PowerShow.com