Title: oMAP: Combining Classifiers for Aligning Automatically OWL Ontologies
1oMAP Combining Classifiers for Aligning
Automatically OWL Ontologies
Raphaël Troncy, Umberto Straccia
Tuesday 22th of November, 2005
2Agenda
- Motivations
- oMAP
- A formal framework
- The different classifiers used
- Evaluation
- Conclusion
3Motivations
- Heterogeneity of information systems
- Ontologies as a solution to data heterogeneity on
the Web - Ontologies are themselves heterogeneous
- knowledge representation language
- degree of formalization
- Semantic Web
- More and more OWL/RDF ontologies on the Web
- Need for comparing/reusing/merging ontologies
- partially covering the same domain
- different version of the same ontology
4Aligning Ontologies
- A matching operator KnowledgeWeb, 2005
- Input a set of discrete entities (tables, XML
elements, classes, properties) - Output
- relationship holding between the entities
(subsumption, equivalence, disjointness) - a confidence measure
- Automatic vs manual techniques
- Numerous work from various communities
- schema matching, machine learning, data
integration
5Example
Equivalence Subsumption Disjointness
6Agenda
- Motivations
- oMAP
- A formal framework
- The different classifiers used
- Evaluation
- Conclusion
7oMAP A Formal Framework
- Inspirations
- Formal work in data exchange Fagin et al., 2003
- GLUE combining several specialized components
for finding the best set of mappings Doan et
al., 2003 - Notations
- A mapping is a tuple M (T, S, ?)
- S et T are the source and target ontologies
- Si is an OWL entity (class, datatype property,
object property) of the ontology - ? is a set of mapping rules aij Tj ? Si
8oMAP Overall Strategy
- A three step process
- Form possible ? sets and estimate its quality
based on the quality measures for its mapping
rules - For each mapping rule Tj ? Si, estimate its
confidence aij which also depends on the ? it
belongs to - Use heuristics to build iteratively the final set
of mappings
9oMAP Combining Classifiers
- Weight of a mapping rule
- aij w (Si,Tj, ?)
- Using different classifiers
- w (Si,Tj,CLk) is the classifier's approximation
of the rule Tj ? Si - Combining the approximations
- Use of a priority list CL1 CL2 CLn
10Terminological Classifiers
- Same entity names (or URI)
- Same entity name stems
11Terminological Classifiers
- String distance name
- WordNet distance name
- lcs is the longest common substring between Si
and Tj - sim
12Machine Learning-Based Classifiers
- Collecting individuals
- label for the named individuals
- data value for the datatype properties
- type for the anonymous individuals and the range
of object properties - Recursion on the OWL definition
- depth parameter
13Machine Learning-Based Classifiers
- Example
- Individual (x1 type (Conference)
- value (label "Int. Conf. on WISE") value
(location x2) ) - Individual (x2 type (Address)
- value (city "New York city") value (country
"USA") ) - u1 ("Int. Conf. on WISE", "Address")
- u2 ("Address", "New York City", "USA")
- Naïve Bayes text classifier
- kNN text classifier
14Structural and Semantics-Based Classifier
- If Si and Tj are property names
- If Si and Tj are concept names1
1 Where D D(Si) D(Tj) D(Si) represents the
set of concepts directly parent of Si
15Structural and Semantics-Based Classifier
- Let CS(QR.C) and DT(QR.D), then1
- Let CS(op C1Cm) and DT(op D1Dm), then2
1 Where Q,Q are quantifiers, R,R are property
names and C,D concept expressions 2 Where op, op
are concept constructors and n,m 1
16Structural and Semantics-Based Classifier
- Complexity
- number of mapping rules
- number of possible ? sets
- Reduction of the space
- considering ? sets that contain mapping rules for
the classes - considering the range of the datatype properties
(XML Schema taxonomy) - Local maximum heuristic
- pick a concept and consider only the entities
involved in its closure definition (detect cycles
!) - choose the best local ? set
- iterate the process until the convergence
17Structural and Semantics-Based Classifier
- Possible values for wop and wQ weights
- wop wQ
18Agenda
- Motivations
- oMAP
- A formal framework
- The different classifiers used
- Evaluation
- Conclusion
19Evaluation
- More and more techniques / tools for aligning
ontologies KW D2.2.3, 2005 - difficult to compare all the approaches
theoretically - pragmatism evaluation campaign and contest
- I3CON based on the NIST Text Retrieval
Conference model - EON systematic benchmark tests on all OWL
constructs - OAEI http//oaei.inrialpes.fr
- Alignment API Euzenat, ISWC 2004
- common format for representing / exchanging the
alignments found - tools and metrics for evaluating these alignments
20Evaluation EON Contest
- 4 competitors Karlsruhe, INRIA, Fujitsu,
Stanford - 3 series of tests on bibliographic ontologies
- simple tests identity, specialization/generalizat
ion of the language - systematic tests some features of the initial
ontology are progressively discarded - complex tests aligning 4 real ontologies
available on the Web - Results 2 groups but inadequacy /
incomplete-ness of the tests
21Evaluation oMAP and EON
22Evaluation oMAP and OAEI
23Agenda
- Motivations
- oMAP
- A formal framework
- The different classifiers used
- Evaluation
- Conclusion
24Conclusion
- oMAP a formal framework for aligning
automatically OWL ontologies - Combining several specific classifiers
- terminological classifiers
- machine learning-based classifiers
- structural and semantics-based classifier
- Empirical evaluation on benchmark tests
- using traditional information retrieval metrics
- machine resources, memory, computation time not
yet considered
25Future Work
- Using additional classifiers
- kNN, KL-distance, WordNet or other terminological
resources - straightforward theoretically but practically
difficult - Finding complex alignment
- name firstName lastName
- OWL and rule-based languages
- take into account this additional expressivity
26Useful Links
- oMAP http//homepages.cwi.nl/troncy/oMAP/
- Tutorial Schema and Ontology Matching _at_ ESWC
http//dit.unitn.it/accord/Presentations/ESWC'05-
MatchingHandOuts.pdf - Alignment API http//co4.inrialpes.fr/align/align
.html - OAEI http//oaei.inrialpes.fr/
- State of the Art
- P. Shvaiko and J. Euzenat A Survey of
Shema-based Matching Approaches. Journal on Data
Semantics (JoDS), 2005 - KW Consortium State of the Art on Ontology
Alignment. Knowledge Web D2.2.3, 2004