Title: Capturing Evolving Patterns for Ontology-based Web Mining
1Capturing Evolving Patterns for Ontology-based
Web Mining
Ning Zhong
2My Details
Name Email Dr Yuefeng Li y2.li_at_qut.edu.au Sc
hool of Software Engineering and Data
Communications Queensland University of
Technology (QUT) Brisbane, QLD 4001 Australia
3Whats It All About?
- Web-Based Problem Solving
- The Effectiveness of Using Web Data
- Mismatch, means some interesting and useful data
has not been found - Overload, means some gathered data is not what
users want. - Web Intelligence, A new direction which can
provide a new thought for Web-based problem
solving. - Web mining
- To discover patterns (knowledge) from Web data
- Web mining categories Web usage mining, Web
structure mining, Web user profile mining, and
Web content mining
4The Problem
- There is a gap between the effectiveness of using
the Web data and Web mining. - Why
- One reasoning is that there exits many
meaningless patterns in the set of discovered
patterns. - Another reason is that some discovered patterns
might include uncertainties when we extract them
from Web data. - Objective of This Research
- To construct data reasoning of discovered
patterns in order to build effective Web mining
systems for users.
5Our Solution
- Ontology-based Web Mining systems (OWM)
- That tends to refine discovered knowledge using
an ontology through application and maintenance
of the ontology. - To create a bridge between the effectiveness of
using the Web data and Web mining. - Architecture
Automatic ontology extraction
construction
application
Reasoning on ontology
Capturing evolving patterns
maintenance
Ontology
6Ontology-based Web Mining
- Why OWM
- User profiles cannot be described into a single
format of knowledge - Ontology can provide a complete concept space for
describing the discovered knowledge for user
profiles - It is easy to explain discovered knowledge on
ontology to users
7Automatic Ontology Extraction (Ontology learning)
- Ontology learning means the discovery of ontology
for representations of discovered patterns
(knowledge). - Why ontology leaning
- Manual ontology engineering is a tedious and
cumbersome task that can easily result in a
bottleneck for knowledge acquisition - Ontology learning vs. ontology engineering
8Automatic Ontology Extraction (Ontology learning)
cont.
- Ontology definition
- O ltC, R, HC, rel, AOgt, where
- C is a set of classes (concepts) R is a set of
relations HC ? C ?C is called taxonomy, HC(c1,
c2) means c1 is-a c2 rel R ? C ?C is a
function defined for other relations and AO is a
logical language. - Backbone construction
- Basic assumption - syntactically classes are
constructed from some primary ones. - An ontology consists of
- primitive classes, the smallest concepts that
cannot be assembled from other classes however,
they may be inherited by some derived concepts or
their children (sub-terms). - compound classes, which can be set up from a set
of primitive classes using some constructor
operations.
9Automatic Ontology Extraction (Ontology learning)
cont.
- Ontology Extraction Algorithm
- Lexical entry extraction
- keywords selection
- Select primitive objects (or terms) from the set
of keywords, where each term denotes a group of
keywords, e.g., term pet may include dog,
cat - ii) Determine compound objects (expanded
patterns) - iii) Decide the id (a set of terms) for all
compound objects - iv) Generate a graph representation
10Automatic Ontology Extraction (Ontology learning)
cont.
A initial backbone, where arrows denote is-a
relation, and diamond arrows denote part-of
relation.
root
accormmodation
11Automatic Ontology Extraction (Ontology learning)
cont.
- Innovation
- New structure of the backbone
- Present a method to exam the relation between
patterns of term weight pairs.
12Data Reasoning on Ontology
- Objective
- Use the discovered knowledge on the ontology to
answer what users want. - Reasoning Model Construction
- Determine a common hypothesis space
- Deploy discovered knowledge on the hypothesis
space - Decide decision rules
- Design filtering algorithm for applying decision
rules.
13Data Reasoning on Ontology cont.
- Normalization
- A common hypothesis space, ?, the set of
primitive objects. - PL(p1, N1), (p2, N2), , (pn, Nn) be the
set of expanded patterns on the ontology ? - pi denote the compound objects
- Ni denote the number of appearance of the
similar objects - A support function, which satisfies
- A deploying function
14Data Reasoning on Ontology cont.
- Representation of discovered knowledge, a
computation function on the hypothesis space
15Data Reasoning on Ontology cont.
16Data Reasoning on Ontology cont.
- Filtering Algorithm
- Automatic threshold selection
- It is complete since
?
for all o? POS.
17Data Reasoning on Ontology cont.
Backbone construction, where, arrows denote
is-a relation and diamond arrows denote the
part-of relation.
root
accommodation 0
18Data Reasoning on Ontology cont.
- Innovation
- An ontology-based filtering algorithm
- Automatic determine threshold, a complete method
- Use part of irrelevant data for training.
19Capturing Evolving Patterns
- Why knowledge evolution
- Some patterns may be meaningless or some patterns
may include uncertainties since there are a lot
of noises in the training data. - Increasing the size of the training set is not
useful because of the noises - How to update discovered patterns
- we present a method for tracing errors made by
the system. The method provides a novel solution
for knowledge evolution.
20Capturing Evolving Patterns
- Definitions
- Negative pattern
- It is marked in relevance by the Web mining
system but it is actually an irrelevant one to
users
- Total conflict offender
- Its ids is a subset of id(np), i.e., np can be
derived from it. - Partial conflict offenders
- Its ids is not a subset of id(np) but join with
id(np).
21Capturing Evolving Patterns cont.
- Algorithm - for a given negative pattern, np
- Check which positive patterns have been used to
cause such error. We call these positive
patterns offenders of np. - Remove all total conflict offenders (meaningless
ones) from PL, and - Reshuffle (remove uncertainties) the partial
conflict offenders frequency distribution by
shifting part of the offering from the joint part
of its id and id(np) to the rest of part of its
id.
22Capturing Evolving Patterns cont.
X X X
x x
Conclusion For a given negative pattern np,
there is an integer n such that after n
continuous calls of the reshuffle gt its weight
less than the threshold.
23Capturing Evolving Patterns cont.
- Innovation
- A new way for updating and refining of discovered
knowledge - Remove some meaningless patterns.
- Update some patterns that included uncertainties.
24Testing and Evaluation
- The data collection
- TREC2002 (Text REtrieval Conference, see
http//trec.nist.gov/) data collection as the
experimental data. This data collection is
provided by Reuters Corpus. - It contains more than half million XML documents
used in Reuters Corpus from 1996-08-20 to
1997-08-19. - The topics used in our experiments are 101, 102,
, and 109.
25Performance 1
26Performance 2
The breakeven point does not exist.
27Thresholds
28Negative objects
29Conclusions
- An ontology-based Web mining model is presented
in order to build the bridge between the
effectiveness of using Web data and Web mining. - A new algorithm for automatic ontology extraction
is presented. - A data reasoning strategy for use of discovered
knowledge is presented. It first deploys the
discovered knowledge over a common hypothesis
space. It also uses decision rules to determine
what users want (a novel filtering algorithm). - A novel method is presented for capturing
evolving patterns. - Future Work
- Formalization of details relations between
patterns.
30The End