Semantic web role and its method: Domain ontology - PowerPoint PPT Presentation

About This Presentation
Title:

Semantic web role and its method: Domain ontology

Description:

1. Department of Information Management Chaoyang University of Technology ... In order to resolve the antinomy of stability and plasticity, the ART network ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 80
Provided by: csieN4
Category:

less

Transcript and Presenter's Notes

Title: Semantic web role and its method: Domain ontology


1
Semantic web role and its method Domain ontology
  • Rung Ching Chen (???)

2
(No Transcript)
3
??? ??
  • ???????,???????????????,???????????????,?????
    ??????????,????????

4
????????????
  • ???????,???????
  • ???????,???????????????,????????

5
Outline
  • Introduction
  • Literature reviews
  • Ontology application
  • Ontology construction
  • Experimental results
  • Conclusions and current research

6
Introduction
  • Introduction
  • Background
  • Motivation
  • Objective
  • Literature reviews
  • Ontology construction
  • Experimental results
  • Conclusions and future works

7
Background (1/2)
  • The content of web sites changes rapidly and
    grows very fast
  • How to understand querists needs and how to find
    related web pages from the Internet are very
    important.
  • Yahoo vs. Google

8
Background (2/2)
  • The main drawback of current search engines is
    that they cant read the real semantic of the web
    page content. They dont use the domain specific
    knowledge for web page analyses.
  • The concept of Semantic Web has been proposed
    recently.

9
Motivation
  • Semantic web and ontology
  • The construction of successful semantic web
    depends on whether the ontology can be
    constructed rapidly and easily.
  • Most of the research on ontology construction is
    determined by domain experts. It is difficult to
    modify the concepts of an existed domain ontology
    for a semantic web.

10
Objective
  • A large number of ontology representation methods
    have been proposed.
  • we use the hierarchical tree structure to
    represent the domain ontology because it is the
    most general one .
  • Methods of construct ontology
  • Manual construction
  • Semi-automatic construction
  • full-automatic construction

11
Literature reviews
  • Introduction
  • Literature reviews
  • Semantic web
  • Ontology
  • Information classification model
  • Single value decomposition
  • Adaptive resonance theory network
  • Ontology construction
  • Experimental results
  • Conclusions and future works

12
Semantic web (1/2)
  • Drawbacks of existing network
  • The information is presented in documents.
  • It is unable to process or extract the
    information that people actually need.
  • Semantic web is an extension of the existing
    network structure
  • Provide a new foundations of data description.
  • Promotional development network service
    automatically.
  • Make the information understandable to machines.

13
Semantic web (2/2)
  • Builds the high-level languages on low-level
    languages progressively.
  • Offers the information that the computer can read
    without revising the existing webpage content.

14
Ontology (1/4)
  • The W3C has defined ontology as knowledge for
    describing and expressing various domains using
    concepts, definitions, and relations.
  • Ontology usually appears in the form of semantic
    web.
  • A node represents a concept or an individual
    entity on the semantic web.

15
Ontology (2/4)
  • Gruber definition An ontology is a formal,
    explicit specification of a shared
    conceptualization
  • Conceptualization a certain existing phenomenon
    or the relevant abstract model of concept of the
    definite phenomenon in the field.
  • Share ontology is shared by a group, not an
    individual.
  • Formal ontology can be read and understood by
    computers.
  • Explicit the concept form and restriction of
    ontology can be expressed in clear way.

16
Ontology (3/4)
  • Gruber thought the elements of ontology include
  • Concept Concept can be used to represent any
    thing in the real world. It is usually
    organized as a tree structure in ontology.
  • Relation Relation means the connection between
    concepts of the certain types.
  • Function Function is a special case for
    Relation.
  • Axiom The axiom is used to model the fact.
  • Instance The instance is the appearance of
    concretized concept.

17
Ontology (4/4)
  • Ontology language is extended from the XML
    (Extensible Markup Language) syntax.
  • It is responsible for W3C to formulate and renew.

18
Domain Ontology Applications
  • Grigoris Antoniou
  • Frank van Harmelen

19
(No Transcript)
20
  1. Horizontal Information Products at Elsevier
  2. Data Integration at Audi
  3. Skill Finding at Swiss Life
  4. Think Tank Portal at EnerSearch
  5. E-Learning
  6. Web Services
  7. Other Scenarios

21
Elsevier The Setting
  • Elsevier is a leading scientific publisher.
  • Its products are organized mainly along
    traditional lines
  • Subscriptions to journals
  • Online availability of these journals has until
    now not really changed the organisation of the
    productline
  • Customers of Elsevier can take subscriptions to
    online content

22
Elsevier The Problem
  • Traditional journals are vertical products
  • Division into separate sciences covered by
    distinct journals is no longer satisfactory
  • Customers of Elsevier are interested in covering
    certain topic areas that spread across the
    traditional disciplines/journals
  • The demand is rather for horizontal products

23
Elsevier The Problem (2)
  • Currently, it is difficult for large publishers
    to offer such horizontal products
  • Barriers of physical and syntactic heterogeneity
    can be solved (with XML)
  • The semantic problem remains unsolved
  • We need a way to search the journals on a
    coherent set of concepts against which all of
    these journals are indexed

24
Elsevier The Contribution of Semantic Web
Technology
  • Ontologies and thesauri (very lightweight
    ontologies) have proved to be a key technology
    for effective information access
  • They help to overcome some of the problems of
    free-text search
  • They relate and group relevant terms in a
    specific domain
  • They provide a controlled vocabulary for indexing
    information

25
Elsevier The Contribution of Semantic Web
Technology (2)
  • A number of thesauri have been developed in
    different domains of expertise
  • Medical information MeSH and Elseviers life
    science thesaurus EMTREE
  • RDF is used as an interoperability format between
    heterogeneous data sources
  • EMTREE is itself represented in RDF

26
Elsevier The Contribution of Semantic Web
Technology (3)
  • Each of the separate data sources is mapped onto
    this unifying ontology
  • The ontology is then used as the single point of
    entry for all of these data sources

27
Ontology construction
28
Information classification model
  • There are three traditional information
    classification models
  • Vector space model
  • Probabilistic model
  • Boolean model

29
Vector space and probabilistic model
  • Vector space model
  • The element represents the number of keywords
    that appear in a document. The cosine similarity
    method is used to find the related web pages.
  • Probabilistic model
  • This model uses a probabilistic approach to
    evaluate the relationships among web pages and to
    judge whether they are related.

30
Boolean model
  • It is the simplest categorized method, which is
    based on set theory and Boolean algebra. Boolean
    model can be divided into three relations
    inheritance, intersection and independence

31
Single Value Decomposition (1/2)
  • Row represents documents and column indicates
    keywords.
  • Whether a keywords appears in a document is
    represented as an element.

32
Single Value Decomposition (2/2)
  • Latent Semantic Analysis, LSA project document
    and keywords to a low dimension.
  • Using Singular Value Decomposition, SVD to remove
    unnecessary information.

33
Adaptive resonance theory network (1/3)
  • ART network is an unsupervised learning network
  • Principle
  • The theory of ART grew from the theory of
    cognition.
  • It is similar to a human neural system. Not only
    does it learn new examples, but also preserves
    old memories.

34
Adaptive resonance theory network (2/3)
  • Characteristic
  • It has the features of both stability and
    plasticity.
  • In order to resolve the antinomy of stability and
    plasticity, the ART network adjusts the vigilance
    value.
  • Advantage
  • The learning speed is quick.
  • The consumption memory space is small.
  • Does not have beforehand to establish the group
    number.

35
Adaptive resonance theory network (3/3)
  • The structure of the ART network
  • Input layer The input data is training samples.
  • Output layer This presents the results of the
    trained network.
  • Weight connections This connects the input layer
    and the output layer

36
Ontology construction
  • Introduction
  • Literature reviews
  • Ontology construction
  • Analyzing web pages
  • Finding the TF-IDF values of terms
  • Reducing the matrix and transfer elements to
    duality data
  • Using a recursive ART network to cluster the web
    pages
  • Applying a Boolean model to construct an ontology
  • Representing the ontology using a Jena package
  • Experimental results
  • Conclusions and future works

37
Ontology construction
WWW
Use TF-IDF to find the concept of each group
Boolean method
Constructrelation
Web pages analysis
Whether satisfied low document
Create ontology
Stop-word
Produce RDF ontology
Finding TF-IDF
ART networkfor cluster
SVD operation
38
Analyzing web pages (1/2)
  • After collect web page, the system removes stop
    words.
  • Stop words can avoid wrong judgment when there
    are some non-important words but appear the
    frequency to be high.

39
Analyzing web pages (2/2)
  • Most web pages are written in HTML. HTML uses
    open/closed tags to indicate web page commands.
  • Tij nij Wm
  • Tij expressed concept Cj appears in web page di
    weight.
  • nij expressed concept Cj the frequency which
    appears under the different tag.
  • Wm expressed the weight of tag.

40
TF-IDF
  • Our research uses the product of TF and IDF to
    represent the importance of a keyword in the
    document.
  • TFi,jit is the term relative to the frequency
    of keyword i in a document j after weight
    operation.
  • IDFi it is the inverse document frequency of
    term i, that is the reciprocal of appear
    frequency of term i in all document.
  • N is the number of all documents
  • ni is the number of appearances of term i in the
    number of documents N.

41
Reducing the matrix and transfer elements to
duality data
  • We list out the keyword and webpage documents to
    make a duality matrix.
  • If the keywords appear in the documents, the
    keyword is set to 1 if not, it is set to 0. The
    SVD operation is used to reduce the large matrix
    to a small one

42
Using the recursive ART network to cluster the
web pages
  • We propose a recursive ART network algorithm to
    produce a tree structure

43
Recursive ART
44
Recursive ART

45
Applying Boolean operation
  • The Boolean model is used to modulate and
    construct the relation between different
    concepts.
  • For example, imagine ten documents involving four
    types of concepts Transports, flying, boats, and
    airplanes.
  • Documents containing transports 1, 2, 3, 4, 5,
    6, 7, 8, 9, 10.
  • Documents containing fly 2, 3, 6, 7, 9, 10.
  • Documents containing boat 1, 4, 5, 8.
  • Documents containing airplane 6, 7, 10.

46
Generating ontology through the Jena package (1/3)
  • A Resource description framework (RDF) is a
    framework developed by W3C and metadata groups.
  • It is able to carry several metadata while
    roaming on the Internet.
  • RDF provides interoperability between
    applications that exchange machine-understandable
    information on the web

47
Generating ontology through the Jena package (2/3)
  • Describe Web resource data
  • Resourceanything that have URI
  • Descriptiondescribe property of resource
  • Three main elements
  • Subject
  • Predicate
  • object

48
Generating ontology through the Jena package (3/3)
  • A given problem may be represented by a meaning
    graph of the RDF
  • where the URI is a web resource and author is a
    property with the value John

49
Experiments
50
Experimental results
  • Experiment environment
  • Pentium-4 2.4G
  • 512MB RAM
  • JAVA program language
  • RDF ontology language

51
Experimental results
  • Introduction
  • Literature reviews
  • Ontology construction
  • Experimental results
  • First stage experiment
  • Second stage experiment
  • Conclusions and future works

52
First stage experiment
  • We select a musical instrument ontology
    constructed by an expert for semi-automatic
    experiment.
  • We use the keywords of the existing domain
    ontology to produce a new ontology provided by
    our method.
  • After the new ontology has been created, we
    compare the new ontology with the expert ontology
    to demonstrate the precision of our method.

53
Data (1/2)
  • Ontology
  • http//www.db-net.aueb.gr/thesus/onto/instrum.rdf
  • 52 concepts
  • has and sub-class relations
  • Data
  • Collected Web pages on Music/Instruments/
    domain.
  • There are 36 catalogs in that domain.
  • 518 Web pages.

54
Data (2/2)
Category Number Category Number Category Number Category Number
Instrument 15 Lute 5 Gong 2 Woodwind 2
Synthesizer 5 Bass 32 Accordion 44 Bassoon 8
Stringed 3 Cello 9 Brass 17 Clarinet 12
Percussion 9 Viola 5 Horn 14 Flute 13
Wind 6 Violin 20 Saxophone 25 Oboe 12
Banjo 26 Mandolin 24 Trombone 11 Panpipes 3
Guitar 24 Piano 19 Trumpet 29 Piccole 5
Harp 20 Bell 3 Tuba 6 Recorder 26
Harpichord 14 Drums 33 Harmonium 6 Harmonica 14
55
Mark matrix
  • After analyses web pages, the column denotes
    keywords, the row represents web documents. If
    the keyword can be found in the web document, it
    will be set to 1, otherwise it will be set to
    0.

56
Recursive ART (1/2)
  • The recursive ART network will check whether the
    output values are greater than the vigilance. We
    test the vigilance step-by-step from 0.1 to 0.9
    with an increment of 0.1.

57
Recursive ART (2/2)
  • The clustering of the ART network results in 78
    groups.
  • we calculated the keywords TF/IDF values for each
    group, using the highest value to represent the
    keyword of the group.
  • Each group generates a representative keyword,
    deleting identical representative keywords among
    different groups, and then leaving only 40
    keywords.

58
group Key-term group Key-term
1 Drum 21 Trumpet
2 Pinched 22 Viola
3 Bass 23 Tuba
4 Harp 24 Clarinet
5 Mandolin 25 String
6 Piccolo 26 Wind
7 Harmonica 27 Trombone
8 Piano 28 Flute
9 Harpsichord 29 Woodwind
10 Violin 30 Bell
11 Guitar 31 Brass
12 Cymbal 32 recorder
13 Accordion 33 Gong
14 Oboe 34 Panpipes
15 Cello 35 Battery
16 Lyre 36 Tambourine
17 Instrument 37 Triangle
18 Percussion 38 Harmonium
19 Synthesizer 39 Bassoon
20 Saxophone 40 banjo
59
Output ontology
  • we obtain a 5-level ontology from the 40
    candidate nodes by Boolean logic level
    operations.

60
Evaluation (1/5)
  • After producing the ontology, we will compared
    this new ontology with the expert-defined
    ontology.
  • Precision and recall rate are then used to
    evaluate our ontology.
  • In order to estimate the precision of the system,
    we defined two kind of precision evaluation
    methods.

61
Evaluation (2/5)
  • Concept precision demonstrates the precision of
    the keywords the system selects.
  • Concept_location precision not only demonstrates
    the precision of the selected keywords but also
    shows the precision of the location in the
    hierarchy relations.
  • Precision (C_P)
    Precision (C_L_P)
  • Recall (R)

62
Evaluation (3/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system A B C D
Keywords not generated by system E
Expert concepts
System keywords
63
Experts defined ontology
  • The ontology of the musical instrument domain
    generated by the experts.

64
Evaluation (4/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system 40 0 29 11
Keywords not generated by system 12
Expert concepts
System keywords
65
Evaluation (5/5)
  • When compared with the ontology defined by an
    expert, the experimental results indicate our
    proposed method
  • Precision (C_P) 100 concept precision.
  • Precision (C_L_P) 73 concept hierarchy
    precision.
  • recall rate of 77,

66
Second stage experiment (1/2)
  • We selected the beer domain and collected web
    pages from the Internet. There are 18 catalogues,
    212 web pages.

Catalogue Number Of web pages Catalogue Number Of web pages
ale 26 pilsner 7
beer 36 microbrewery 4
bitter 6 hop 23
brewery 26 festival 10
larger 14 bock 5
liquid 2 bitter 6
yeast 6 ingredient 11
stout 11 organization 5
porter 7 award 7
67
Second stage experiment (2/2)
  • The system selected 1,688 noun terms from the
    6,914 input terms. The system then calculated
    higher TF-IDF to obtain useful keywords from the
    1,688 terms.
  • We also constructed a matrix in which the column
    denotes ontology keywords while the row
    represents web documents.
  • If the keyword can be found in the web document,
    it will be set to 1 otherwise, it will be set
    0.

68
keyword TF-IDF value keyword TF-IDF value
ale 0.91 fermentation 0.617
association 0.89 grist 0.61
award 0.88 kraeusen 0.61
beer 0.872 mash 0.61
bitter 0.81 maltose 0.6
bock 0.81 pasteurization 0.6
brewery 0.81 wort 0.6
festival 0.80 cask 0.6
hop 0.77 firkin 0.59
ingredient 0.72 exchanger 0.58
lager 0.71 adjunct 0.58
liquid 0.70 dme 0.57
malt 0.70 hops 0.57
microbrewery 0.698 malt 0.57
organization 0.698 yeast 0.56
pilsner 0.69 alcoholic 0.56
porter 0.69 aroma 0.56
shout 0.68 astringent 0.56
yeast 0.66 bitter 0.55
dope 0.66 diacetyl 0.55
69
dunker 0.66 esters 0.55
farmhouse 0.66 grainy 0.52
hefeweizen 0.66 happyhours 0.51
helles 0.658 skunked 0.5
kolsch 0.65 oxidation 0.5
lager 0.65 phenolic 0.5
lambic 0.64 yeasty 0.49
maibock 0.64 brewpub 0.483
marzen 0.63 camre 0.47
mead 0.62 breweriana 0.47
mild 0.62 rauchbier 0.46
munchener 0.62 saison 0.4
pilsener 0.51 steinbier 0.4
pilsner 0.51 stout 0.4
pils 0.51 vienna 0.4
porter 0.51
70
Recursive ART (1/2)
  • The recursive ART network will check whether the
    output
  • values are greater than the vigilance. We test
    the vigilance
  • step-by-step from 0.1 to 0.9 with an increment of
    0.1.

71
Recursive ART (2/2)
  • The clustering performed by recursive ART network
    yields 29 groups.

group documents group Documents
1 26 16 8
2 17 17 9
3 22 18 8
4 23 19 2
5 23 20 6
6 9 21 6
7 2 22 7
8 17 23 8
9 8 24 6
10 6 25 8
11 7 26 9
12 12 27 8
13 6 28 4
14 4 29 4
15 8
72
Output ontology
  • In this manner, Each group generates a
    representative keyword, deleting identical
    representative keywords among different groups,
    and then leaving only 13 keywords. Boolean logic
    is used to calculate relationships between levels
    of concepts.

73
Evaluate (1/2)
  • After producing the ontology, its precision must
    be evaluated. However, there was no another
    ontology to compare with. So we invited domain
    experts to evaluate its precision.

Identifies the term Does Not identify the term Identifies the term and location is right Identifies the term but location is in error
The system generates the concepts A B C D
User view of the terms
System terms
74
Evaluate (2/2)
  • Precision (C_P)
  • Precision (C_L_P)
  • The average Precision (C_P) of domain experts
    evaluate is 0.794 (almost 79), and the average
    Precision (C_L_P) of domain experts evaluate is
    0.742 (almost 74).

75
RDF format
  • Finally, we used the W3C standard for ontology
    web languages to record the ontology, and
    outputted the results in a Jena package using an
    RDF format.

76
Conclusions (1/2)
  • Ontology can help user to learn and search
    related information effectively. Constructing an
    ontology fast and correctly has become an
    important topic for content based search on the
    Internet.
  • Our proposed method does require less time to
    select keywords and to define the relations
    automatically with human intervention.

77
Conclusions (2/2)
  • The proposed method facilitates users
    understanding of the content of data and its
    relevancy, and is able to suggest content that is
    highly relevant.
  • In the future, we will focus on investigations a
    better method for finding multi-relations among
    terms, and extend the systems abilities to cover
    a multi-field ontology as the foundation for
    robust and accurate ontology constructing.

78
Current Reasearch
  • Sensors Network Intrusion Detection.
  • Ontology application on Medical Knowledge
  • Ontology merging and alignment
  • Using applied soft computing to solve problems
  • Web pages analysis
  • Image processing
  • RFID Application

79
  • Thanks for listening!
Write a Comment
User Comments (0)
About PowerShow.com