Capturing Evolving Patterns for Ontology-based Web Mining - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Capturing Evolving Patterns for Ontology-based Web Mining

Description:

Ontology-based Web Mining systems (OWM) ... Ontology can provide a complete concept space for describing the discovered ... An ontology consists of ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 31
Provided by: Roe52
Category:

less

Transcript and Presenter's Notes

Title: Capturing Evolving Patterns for Ontology-based Web Mining


1
Capturing Evolving Patterns for Ontology-based
Web Mining
  • Yuefeng Li

Ning Zhong
2
My Details
Name Email Dr Yuefeng Li y2.li_at_qut.edu.au Sc
hool of Software Engineering and Data
Communications Queensland University of
Technology (QUT) Brisbane, QLD 4001 Australia
3
Whats It All About?
  • Web-Based Problem Solving
  • The Effectiveness of Using Web Data
  • Mismatch, means some interesting and useful data
    has not been found
  • Overload, means some gathered data is not what
    users want.
  • Web Intelligence, A new direction which can
    provide a new thought for Web-based problem
    solving.
  • Web mining
  • To discover patterns (knowledge) from Web data
  • Web mining categories Web usage mining, Web
    structure mining, Web user profile mining, and
    Web content mining

4
The Problem
  • There is a gap between the effectiveness of using
    the Web data and Web mining.
  • Why
  • One reasoning is that there exits many
    meaningless patterns in the set of discovered
    patterns.
  • Another reason is that some discovered patterns
    might include uncertainties when we extract them
    from Web data.
  • Objective of This Research
  • To construct data reasoning of discovered
    patterns in order to build effective Web mining
    systems for users.

5
Our Solution
  • Ontology-based Web Mining systems (OWM)
  • That tends to refine discovered knowledge using
    an ontology through application and maintenance
    of the ontology.
  • To create a bridge between the effectiveness of
    using the Web data and Web mining.
  • Architecture

Automatic ontology extraction
construction
application
Reasoning on ontology
Capturing evolving patterns
maintenance
Ontology
6
Ontology-based Web Mining
  • Why OWM
  • User profiles cannot be described into a single
    format of knowledge
  • Ontology can provide a complete concept space for
    describing the discovered knowledge for user
    profiles
  • It is easy to explain discovered knowledge on
    ontology to users

7
Automatic Ontology Extraction (Ontology learning)
  • Ontology learning means the discovery of ontology
    for representations of discovered patterns
    (knowledge).
  • Why ontology leaning
  • Manual ontology engineering is a tedious and
    cumbersome task that can easily result in a
    bottleneck for knowledge acquisition
  • Ontology learning vs. ontology engineering

8
Automatic Ontology Extraction (Ontology learning)
cont.
  • Ontology definition
  • O ltC, R, HC, rel, AOgt, where
  • C is a set of classes (concepts) R is a set of
    relations HC ? C ?C is called taxonomy, HC(c1,
    c2) means c1 is-a c2 rel R ? C ?C is a
    function defined for other relations and AO is a
    logical language.
  • Backbone construction
  • Basic assumption - syntactically classes are
    constructed from some primary ones.
  • An ontology consists of
  • primitive classes, the smallest concepts that
    cannot be assembled from other classes however,
    they may be inherited by some derived concepts or
    their children (sub-terms).
  • compound classes, which can be set up from a set
    of primitive classes using some constructor
    operations.

9
Automatic Ontology Extraction (Ontology learning)
cont.
  • Ontology Extraction Algorithm
  • Lexical entry extraction
  • keywords selection
  • Select primitive objects (or terms) from the set
    of keywords, where each term denotes a group of
    keywords, e.g., term pet may include dog,
    cat
  • ii) Determine compound objects (expanded
    patterns)
  • iii) Decide the id (a set of terms) for all
    compound objects
  • iv) Generate a graph representation

10
Automatic Ontology Extraction (Ontology learning)
cont.
A initial backbone, where arrows denote is-a
relation, and diamond arrows denote part-of
relation.
root
accormmodation
11
Automatic Ontology Extraction (Ontology learning)
cont.
  • Innovation
  • New structure of the backbone
  • Present a method to exam the relation between
    patterns of term weight pairs.

12
Data Reasoning on Ontology
  • Objective
  • Use the discovered knowledge on the ontology to
    answer what users want.
  • Reasoning Model Construction
  • Determine a common hypothesis space
  • Deploy discovered knowledge on the hypothesis
    space
  • Decide decision rules
  • Design filtering algorithm for applying decision
    rules.

13
Data Reasoning on Ontology cont.
  • Normalization
  • A common hypothesis space, ?, the set of
    primitive objects.
  • PL(p1, N1), (p2, N2), , (pn, Nn) be the
    set of expanded patterns on the ontology ?
  • pi denote the compound objects
  • Ni denote the number of appearance of the
    similar objects
  • A support function, which satisfies

- A deploying function
14
Data Reasoning on Ontology cont.
  • Representation of discovered knowledge, a
    computation function on the hypothesis space

15
Data Reasoning on Ontology cont.
  • Decision rules

16
Data Reasoning on Ontology cont.
  • Filtering Algorithm
  • Automatic threshold selection

- It is complete since
?
for all o? POS.
17
Data Reasoning on Ontology cont.
Backbone construction, where, arrows denote
is-a relation and diamond arrows denote the
part-of relation.

root
accommodation 0
18
Data Reasoning on Ontology cont.
  • Innovation
  • An ontology-based filtering algorithm
  • Automatic determine threshold, a complete method
  • Use part of irrelevant data for training.

19
Capturing Evolving Patterns
  • Why knowledge evolution
  • Some patterns may be meaningless or some patterns
    may include uncertainties since there are a lot
    of noises in the training data.
  • Increasing the size of the training set is not
    useful because of the noises
  • How to update discovered patterns
  • we present a method for tracing errors made by
    the system. The method provides a novel solution
    for knowledge evolution.

20
Capturing Evolving Patterns
  • Definitions
  • Negative pattern
  • It is marked in relevance by the Web mining
    system but it is actually an irrelevant one to
    users
  • Total conflict offender
  • Its ids is a subset of id(np), i.e., np can be
    derived from it.
  • Partial conflict offenders
  • Its ids is not a subset of id(np) but join with
    id(np).

21
Capturing Evolving Patterns cont.
  • Algorithm - for a given negative pattern, np
  • Check which positive patterns have been used to
    cause such error. We call these positive
    patterns offenders of np.
  • Remove all total conflict offenders (meaningless
    ones) from PL, and
  • Reshuffle (remove uncertainties) the partial
    conflict offenders frequency distribution by
    shifting part of the offering from the joint part
    of its id and id(np) to the rest of part of its
    id.

22
Capturing Evolving Patterns cont.



X X X
x x


Conclusion For a given negative pattern np,
there is an integer n such that after n
continuous calls of the reshuffle gt its weight
less than the threshold.
23
Capturing Evolving Patterns cont.
  • Innovation
  • A new way for updating and refining of discovered
    knowledge
  • Remove some meaningless patterns.
  • Update some patterns that included uncertainties.

24
Testing and Evaluation
  • The data collection
  • TREC2002 (Text REtrieval Conference, see
    http//trec.nist.gov/) data collection as the
    experimental data. This data collection is
    provided by Reuters Corpus.
  • It contains more than half million XML documents
    used in Reuters Corpus from 1996-08-20 to
    1997-08-19.
  • The topics used in our experiments are 101, 102,
    , and 109.

25
Performance 1
26
Performance 2
The breakeven point does not exist.
27
Thresholds
28
Negative objects
29
Conclusions
  • An ontology-based Web mining model is presented
    in order to build the bridge between the
    effectiveness of using Web data and Web mining.
  • A new algorithm for automatic ontology extraction
    is presented.
  • A data reasoning strategy for use of discovered
    knowledge is presented. It first deploys the
    discovered knowledge over a common hypothesis
    space. It also uses decision rules to determine
    what users want (a novel filtering algorithm).
  • A novel method is presented for capturing
    evolving patterns.
  • Future Work
  • Formalization of details relations between
    patterns.

30
The End
  • QUESTIONS?
Write a Comment
User Comments (0)
About PowerShow.com