Title: Ontology Selection for the Real Semantic Web:
1Ontology Selection for the Real Semantic Web
- How to Cover the Queens Birthday Dinner?
Marta Sabou Vanessa Lopez, Enrico Motta
2Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
3Ontology Selection
- Identify ontologies that fulfil the needs of a
task.
b
Query(a,b,c)
c
Task
a
Selection
Output
Selection criteria
Ontology Library
Ontology structure, coverage, popularity
Coverage all query elements that appear in the
ontology.
4About the Queen
- The Queen will be 80 on 21 April and she is
celebrating her birthday with a family dinner
hosted by Prince Charles at Windsor Castle
Select an ontology that covers
Queen, dinner, birthday
5Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
6Current View
- The output of the selection is interpreted by a
human user, therefore
- Coverage can be partial
- Missing keywords can be added manually
- Coverage can be in-precise
- E.g., Queen as a Bee, or return DinnerFork for
Dinner - These errors can be detected and filtered
manually - Response time is not critical
- Users happier to wait a few minutes rather than
building an ontology from scratch
7Current Approaches
- AKTiveRank, OntoSelect, Swoogle, OnthoKoj
- High Coverage not enforced
- Return the ontology with highest coverage, but no
efforts are made to increase that coverage - Coverage is in-precise
- Corresponding concepts are identified using
string similarity metrics, e.g., return Teacher
for tea - The semantics of these concepts is ignored
- Response time is low
- On average 2 minutes/ontology for AKTiveRank
8Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
9AquaLog
1. NL Question
2. Linguistic interpretation
3. Ontology based interpretation
4. Answer
10Magpie
NL Question
Ontology concepts
Instances highlighted according to their type
11Tools evolve
- NOW rely on a single, fixed ontology (that
limits their scope)
NEXT automatically select and combine multiple
ontologies
PowerAqua
Ontology Selection
Ontology Selection
.
Semantic Web
12Requirements
Is it easy to achieve?
- Increased (complete) coverage
- Cover the maximum amount of query terms
- Return ontology combinations that jointly offer a
high coverage - Precise coverage
- A semantic relation should hold between query
terms and their corresponding covers - Quick
- embedded in runtime applications
- Also deal with instances and properties
- Particularly important for PowerAqua and Magpie
- Modular
- Return a relevant module rather than an entire
ontology
13Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
14Achieving Coverage 1/3
- How many ontologies contain (queen, dinner,
birthday)?
gt 0
(t1,t2,t3)
(t1) (t2) (t3) (t1,t2) (t1,t3) (t2,t3)
(t1,t2,t3) (queen, birthday, dinner) 0 9
2 0 0 1 0 (adventurer,
expedition, photo) 1 0 32 0
1 0 0 (mountain, team, talk) 12 25 9
2 1 1 1
- Knowledge is sparse
- many terms are not covered
- The more keywords, the harder it is to get good
coverage
15Achieving Coverage 2/3
- How about more popular domains?
(t1,t2,t3)
(t1) (t2) (t3) (t1,t2) (t1,t3) (t2,t3)
(t1,t2,t3) (project, article, researcher)
84 90 24 19 13 9
8 (researcher, student, university) 24 101
64 16 15 38 13
(research, publication, author) 15 77
138 8 5 36 4
Better coverage but the number of keywords plays
a crucial role.
What about properties?
(t1,t2,t3)
(t1) (t2) (t3) (t1,t2) (t1,t3) (t2,t3)
(t1,t2,t3) (project, relatedTo, researcher)
84 11 24 0 13 0
0 (academic, memberOf, project) 21 36 84
0 3 5
0 (article, hasAuthor, person) 90 14 371
8 32 2 0
Good coverage difficult to obtain even in popular
domains.
16Achieving Coverage 3/3
- Achieving a good coverage is difficult for
- High number of keywords (gt3)
- Magpie derives several keywords for each webpage
- Keywords from different or weakly covered domains
- Webpages mix terms from different topic domains
(Magpie) - Online Knowledge is sparse (few domains are well
covered) - Properties
- Properties (and instances) are crucial for
PowerAqua
Query expansion mechanisms are needed to achieve
better coverage
17Semantic expansion
Replace query terms with semantic equivalents
(WN).
(t1,t2,t3) (t1) (t2)
(t3) (t1,t2) (t1,t3) (t2,t3) (t1,t2,t3)
(queen, birthday, dinner) 0 9
2 0 0 1 0 (woman,
birthday, dinner) 32 9 2
1 1 1 1 (adventurer, expedition,
photo) 1 0 32 0 1
0 0 (person, trip, photo)
371 7 32 1 20 1 1
(academic, memberOf, project) 21 36 84
0 3 5 0 (person,
memberOf, project) 371 36 84 16
46 5 5
18Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
19Putting it All Together
- We need a tool that provides
- Good (maximal) coverage
- but this is hard to achieve on the current SW
- Return multiple, jointly-covering ontologies
- Semantic expansion
- Syntactic expansion (less rigid syntactic match)
- Precise coverage
- Complement syntactic matching with semantic
mapping - Quick
- Incremental (complex stages only when needed)
20Proposed Algorithm
Stage I
O3
queen
O2
O1
1. Ontology Identification
bee
ExactMatch
1. 1. Syntactic Match
dinner
birthday
queen
1. 2. Semantic Match
SemMatch
O2
O1
queen
birthday
dinner
All terms Covered?
Yes.
2. Identify Ontology Combinations
yes
OntoCombinations
O1 O2
21Proposed Algorithm
Stage I
O2
1. Ontology Identification
ExactMatch
1. 1. Syntactic Match
birthday
1. 2. Semantic Match
SemMatch
NO
All terms Covered?
Stage II
2. Identify Ontology Combinations
22Proposed Algorithm
Stage II
Queen gt Woman, Person
0. Query Expansion
SemanticExpansion
1. Ontology Identification
Person
O1
woman
O2
ExactMatch
1. 1. Syntactic Match
dinner
dinner
birthday
birthday
1. 2. Semantic Match
SemMatch
Sense(keyword) sense(concept)
All terms Covered?
Yes.
2. Identify Ontology Combinations
yes
OntoCombinations
O1 , O2
O1
3. Generality Ranking
GeneralityRanking
23Proposed Algorithm
Stage II
Queen gt Woman/Person
0. Query Expansion
SemanticExpansion
1. Ontology Identification
person
O1
woman
O2
ExactMatch
1. 1. Syntactic Match
dinner
dinner
1. 2. Semantic Match
SemMatch
NO
All terms Covered?
Stage III
2. Identify Ontology Combinations
24Proposed Algorithm
Stage III
0. Query Expansion
BirthdayParty, BirthdayCard
SyntacticExpansion
1. Ontology Identification
1. 1. Syntactic Match
BirthdayParty Birthday Party
SemMatch LabelSplit LabelInterpretation
1. 2. Semantic Match
BirthdayCard Birthday Card
All terms Covered?
O1
woman
2. Identify Ontology Combinations
BirthdayParty
dinner
yes
O1
OntoCombinations
3. Generality Ranking
GeneralityRanking
O1
25Proposed Algorithm
Stage I
Stage II
Stage III
SemanticExpansion
1. Query Expansion
2. Ontology Identification
SyntacticExpansion
ExactMatch
ExactMatch
2. 1. Syntactic Match
2. 2. Semantic Match
SemMatch LabelSplit LabelInterpretation
SemMatch
SemMatch
All terms Covered?
All terms Covered?
All terms Covered?
3. Identify Ontology Combinations
no
yes
no
yes
yes
OntoCombinations
OntoCombinations
I give up!
GeneralityRanking
4. Generality Ranking
26Implementation so far
Stage I
Stage II
Stage III
SemanticExpansion
1. Query Expansion
2. Ontology Identification
SyntacticExpansion
ExactMatch
ExactMatch
2. 1. Syntactic Match
2. 2. Semantic Match
SemMatch LabelSplit LabelInterpretation
SemMatch
SemMatch
All terms Covered?
All terms Covered?
All terms Covered?
3. Identify Ontology Combinations
no
yes
no
yes
yes
OntoCombinations
OntoCombinations
I give up!
GeneralityRanking
4. Generality Ranking
27Modularization
Knowledge Selection
Term extraction
tn
t1
t2
t3
t4
t5
t2
t5
t1
tn
t3
t4
M.DAquin, M.Sabou, E.Motta Modularization a
Key for the Dynamic Selection of Relevant
Knowledge Components, Ontology Modularization WS,
ISWC06
28Outline
- What is Ontology Selection?
- Current view
- The real Semantic Web
- New tools
- Characteristics of online knowledge
- Proposed Algorithm
- Conclusions
29Take home
- Current OS mostly for human users
- On the real SW
- tools pose stricter requirements on OS (increased
and improved coverage, fast) - knowledge is sparse (at least now)
- Our algorithm
- Balances good coverage and time performance
- Incremental
- Many, many open questions
- Finalizing semantic mapping
- Computing semantic generality
30