Title: Learning Domain Ontologies for Web Service Descriptions:
1Learning Domain Ontologies for Web Service
Descriptions
an Experiment in Bioinformatics
Marta Sabou Vrije Univeristeit Amsterdam
Chris Wroe Carole Goble University of Manchester
Gilad Mishne University of Amsterdam
2Outline
- Problem
- Building Web Service Domain ontologies
- A real life example MyGrid
- The Bioinformatics solution
- Data Sets
- Extraction Method
- Evaluation
- Summary
3Web Service Domain Ontologieswhat are they?
A Semantic WS description consists of
a generic WS ontology (e.g., OWL-S)
a domain ontology (DO) providing domain
concepts and functionalities
lt rdfIDWS1"gt
ltowlshasInput rdfresource /gt
ltowlshasInput rdfresource
/gt ltowlshasOutput
rdfresource
/gt lt/ gt
4Building WS Domain Ontologies
Complex reasoning tasks can be best performed if
many semantic WS descriptions are based on the
same DO.
Therefore DOs should have a broad coverage of
their domain.
- BUT
- DOs have to be built by domain experts.
- The number of WSs is increasing (to hundreds per
domain). - There are no guidelines and very little tool
support exists. - Generic ontology learning is not enough.
Building a high coverage DO is time consuming
task that should be automated (at least
partially).
5A real life story.
- MyGrid is a UK e-Science project that builds
semantic Grid middleware to support in silico
experiments in biology. - Bioinformatics programs are exposed as semantic
Web services.
600 (Services)
4 months!!
150 (Services)
550 Concepts But only 125 (23) used for SWS tasks
Domain Expert
- Our GOAL
- Support Domain Experts to learn
- From more services
- In less time
- A Better ontology (for SWS descriptions)
6Outline
- Problem
- Building Web Service Domain ontologies
- A real life example MyGrid
- The Bioinformatics solution
- Data Sets
- Extraction Method
- Evaluation
- Summary
7Data Provided by MyGrid
- Data Source
- short descriptions of service functionalities
- used textual description of 150 EMBOSS services
- characteristics
- small corpora (100/200 documents)
- employ specific style (sublanguage)
- Replace or delete sequence sections.
- Find antigenic sites in proteins.
- Cai codon usage statistic.
- Gold Standard
- MyGrid manually built domain ontology.
8Extraction method
Replace or delete sequence sections.
D0. Corpus
9GATE Implementation
- Easy to follow extraction (step by step)
- Easy to adapt for domain engineers
10Visual Ontology Inspection
- Easy to understand the results
- why were some concepts learned?
11Outline
- Problem
- Building Web Service Domain ontologies
- A real life example MyGrid
- The Bioinformatics solution
- Data Sets
- Extraction Method
- Evaluation
- Summary
12Evaluation - 1
Asses if the learned ontology (LO) is good
1) it covers the domain gt compare to GS
2) it is suitable for the search task gt compare
to Application Onto
Compare by Overlap, where Overlap (O1, O2)
concepts shared by ontologies O1 and O2
Results Overlap (GS, LO) 7 Overlap (AO, LO)
20
Gold Standard (GS)
2. suitable for a task
learned ontology (LO)
application ontology (AO)
1.Domain coverage
- Conclusion
- LO closer to AO
- (more suited for WS Search).
- very low overlap.
ontology learning
manual ontology building
corpus
Is the Learned Oontology really so bad?
13Evaluation - 2
Is the learned ontology (LO) so bad 3)
ask domain expert for a per concept evaluation
- Count three categories of concepts
- Correct both in LO and GS
- New only in LO, but relevant and should be in
GS as well - Spurious useless
- Compute Oprecision (correct new) / (correct
new spurious)
3.Expert Evaluation
Results OPrecision 87
learned ontology (LO)
- Conclusion
- More than half of the concepts LO are relevant.
- LO brings several new additions to GS.
- New concepts account to 56 of GS concepts.
ontology learning
corpus
So, why did we lose so many (93 ) concepts of
the GS?
14Evaluation - 3
Why did we lose 93 of the GS concepts?
7 correct
56 new
We analyzed What does this 93 contain?
LO
GS
93 missed
2) Domain and auxiliary knowledge external to
the corpus.
- Concepts defining views.
- (18 of all concepts)
The GS contains more concepts than extractable
from the corpus.
15Evaluation - 4
And why are there so many new additions (56)?
7 correct
56 new
We asked the ontology curators opinion.
LO
GS
93 missed
- He did not perform a meticulous examination of
the corpus. - He worked only on 100 services (not 156 as us).
- He created a preferred term for several
synonyms. - He expanded abbreviations.
Curator says the automatic approach leads to a
more faithful reflection of the terms the
community uses to describe their services
16Evaluation - 5
I wish I had that!
17Outline
- Problem
- Building Web Service Domain ontologies
- A real life example MyGrid
- The Bioinformatics solution
- Data Sets
- Extraction Method
- Evaluation
- Summary
18Summary
1. Broad coverage DOs are important but hard to
build many Web services exist per domain
textual WS descriptions contain important
concepts but domain experts do not read WS
descriptions no guidelines, few tools.
2. DOs can be semi-automatically learned
using textual descriptions of WS
functionalities adapting simple Ontology
Learning methods using several evaluation
strategies (Gold Standards are not always
golden).
3. The semi-automatically learned ontologies
are suitable for semantic WS descriptions
provide a good starting point for building a
complex DO.