Title: Ontology Merging
1Ontology Merging
- Kyriakos Kritikos (??)
- Miltos Stratakis (MET)
2Representation Matching
- Problem of creating semantic mappings between two
data representations - Mapping examples
- Element location of one representation maps to
element address of the other - Contact-phone maps to agent-phone
- Listed-price maps to price (1 tax-rate)
- Fundamental step in numerous data management
applications - But, manual effort in semantic mapping has become
intensive, due to the expansive development of
the above applications
3Applications of Representation Matching (I)
- Schema integration (early 1980s)
- Need to merge a set of given schemas into a
single global schema - Data warehousing - Data mining (early 1990s)
- Need to translate data between multiple databases
- Data coming from multiple sources must be
transformed to data conforming to a single target
schema - Knowledge Base construction (late 1980s, all
1990s) - Used in AI
- KBs store complex types of entities and
relationships, using extended database schemas
(ontologies) - Requirement of semantic mapping between the
involved ontologies (ontology matching problem)
4Applications of Representation Matching (II)
- Data integration systems (recent years)
- Provide an uniform query interface to a big
number of data sources, by enabling users to pose
queries against a mediated schema - Need to use a set of semantic mappings between
the mediated schema and the local schemas of the
data sources - Peer data management systems (recent years)
- Allow peers to query and retrieve data directly
from each other - Need of creation of semantic mappings among the
peers
5Using Ontologies as Representations
- Ontology Explicit specification of a
conceptualization - Can be used
- In an integration task to
- Describe the semantics of the information sources
- Make the content explicit
- For the identification and association of
semantically corresponding information concepts
6Content Explication
- The way the ontologies are employed for content
explication can be different - We can identify three different directions
- Single ontology approaches
- Multiple ontology approaches
- Hybrid ontology approaches
7Single Ontology Approaches
- Use one global ontology providing a shared
vocabulary for the specification of the semantics - Can be applied to integration problems where all
information sources to be integrated provide
nearly the same view on a domain - Not effective if one information source has a
different view on a domain
8Multiple Ontology Approaches
- Each information source is described by its own
ontology - Each source ontology can be developed without
respect to other sources or their ontologies - Can simplify the integration task
- Supports the change of sources
- Not effective in comparing different source
ontologies, due to the lack of a common
vocabulary
9Hybrid Ontology Approaches
- Semantics of each source is described by its own
ontology, but these ontologies are built from a
global shared vocabulary to make them comparable - The shared vocabulary contains basic terms of a
domain which are combined in the local ontologies
in order to describe more complex semantics - New sources can easily be added without the need
of modification - But, existing ontologies can not easily be reused
10The need for Ontology Matching (Integration)
- Semantic Web evolution
- Requirement for formal descriptions of parts of
our human environment (i.e. descriptions of parts
of the real world) - These descriptions, in various degrees of
formalness and specificity, are the ontologies - To form a real web of semantics, ontologies from
different sources should be linked and related to
each other - Problem The reuse of existing ontologies is
often not possible without considerable effort - Ontologies need to
- Be integrated (i.e. merged into a new ontology)
- Be aligned (i.e. they have to be brought into
mutual agreement)
11Ontology Integration Process
- Consists of three steps
- Find the places in the ontologies where they
overlap - Relate concepts that are semantically close via
equivalence and subsumption relations (aligning) - Check the consistency, coherency and
non-redundancy of the result
12Technical Problems with Ontology Combination
- The technical problems that underlie the
difficulties in ontology merging and aligning
are - The mismatches that may exist between separate
ontologies (Mismatches between Ontologies) - The synchronization of the changes made to an
ontology with the revisions to the applications
and data sources that use them (Ontology
Versioning)
13Mismatches between Ontologies
- Key type of problems that hinder the combined use
of independently developed ontologies - We distinguish two levels at which these
mismatches may appear - Language or meta-model level
- Level of the language primitives that are used to
specify an ontology - Mismatches at this level are between the
mechanisms to define classes, relations etc. - Ontology or model level
- Level of the actual ontology of a domain
- A mismatch at this level is a difference in the
way the domain is modelled
14Language level Mismatches
- Occur in combinations of ontologies written in
different ontology languages - We distinguish four types of this level
mismatches - Syntax
- Different ontology languages often use different
syntaxes - Constitutes probably the simplest kind of
language level mismatch - Logical representation
- Existence of different representations of logical
notions - Focused in which language constructs should be
used to express something, not in whether
something can be expressed - Semantics of primitives
- Sometimes, although the same name is used for a
language construct in two languages, the
semantics may differ (e.g. when there are several
interpretations of A equalTo B ) - Language expressivity
- Implies that some languages are able to express
things that are not expressible in other
languages (e.g. some languages have constructs to
express negation and others have not)
15Ontology level Mismatches
- Happen in combination of two or more ontologies
that describe (partly) overlapping domains - We can distinguish the mismatches of this level
in four classifications - Conceptualization mismatch
- A difference in the way a domain is interpreted,
which results in different ontological concepts
or different relations between those concepts - Explication mismatch
- A difference in the way the conceptualization is
specified - Terminological mismatch
- A difference in the way the terms are described
- Encoding mismatch
- Values in the ontologies may be encoded in
different formats (e.g. a date may be represented
as dd/mm/yyyy or as mm-dd-yy) - Terminological and encoding mismatches can be
considered as specialized explication mismatches
16Conceptualization Mismatches
- We distinguish two types of these mismatches
- Scope
- When two classes seem to represent the same
concept, but do not have exactly the same
instances (e.g. several administrations use
slightly different concepts of employee) - Model coverage and granularity
- The mismatches of this level are in the part of
the domain that is covered by the ontology or in
the level of detail to which that domain is
modelled - For example, one ontology might model cars but
not trucks, another might represent trucks but
only classify them into a few categories, while a
third one might make very specified distinctions
between types of trucks based on their general
physical structure, weight etc.
17Explication Mismatches
- We distinguish two types of these mismatches
focused on the style of modeling - Paradigm
- Different paradigms can be used to represent
concepts such as time, action, plans etc. - For example, the use of different top-level
ontology is a mismatch of this type - Concept description
- Several choices can be made for the modeling of
concepts in the ontology - For example, we can consider the place where the
distinction between scientific and non-scientific
publications is made - A dissertation can be modelled as dissertation lt
book lt scientific publication lt publication, or
as dissertation lt scientific book lt book lt
publication
18Terminological Mismatches
- We distinguish two term types in which there can
be these mismatches - Synonym terms
- Concepts could be represented by different names
- For example, an ontology may use the term car
and another ontology may use the term
automobile - Homonym terms
- The meaning of a term could be different in an
other context - For example, the term conductor has a different
meaning in a music domain than in an electric
engineering domain
19Ontology Versioning
- In an open domain, the changes in the ontologies
used are unavoidable, so it becomes very
important to keep track of these changes - Although the problem is introduced by subsequent
changes to one specific ontology, the most
important problems are caused by the dependencies
on that ontology - A versioning scheme should pay attention of the
following aspects - The relation between succeeding revisions of one
ontology - The relation between the ontology and its
dependencies - Instance data that conforms to the ontology
- Other ontologies that are built from or import
the ontology - Applications that use the ontology
20Versioning Scheme Requirements
- Identification
- For every use of a concept or a relation, a
versioning framework should provide an distinct
reference to the intended definition - Change tracking
- A versioning framework should make the relation
of one version of a concept or relation to other
versions of that construct explicit - Transparent translating
- A versioning framework should as far as possible
automatically perform conversions from one
version to another, to enable transparent access
21Practical Problems with Ontology Combination
- Finding alignments
- It is difficult to find the terms that need to be
aligned - Diagnosis
- The consequences of a specific mapping
(unforeseen implications) are difficult to see - Repeatability of merges
- The sources that are used for the merging
continue to evolve - The alignments that are created for the merging
should be as much reusable as possible for the
merging of the revised ontologies - Very important in the context of ontology
maintenance
22Problems Overview
23Super-imposed Metamodel
- Transforms information between representations.
- Approach
- Represent info from diff models in a uniform way
- Provide a mapping formalism.
- Technique
- Ontology langs are represented in a meta-model
through RDF triples. - Mapping specified by production rules over RDF
triples. -
- Mapping rules provide integration at schema and
instance level. - -
- Handles only language mismatches but not
expressivity. - Mappings are specified manually.
24OKBC
- A generic interface to KRS.
- A KR lang is mapped to OKBC Knowledge Model (KM).
-
- Interoperability achieved at the level of OKBC
KM. - Solves language mismatches but not expressivity.
- -
- Notions requiring higher level of expressivity
are lost. - Does not express terminological axioms like
covering, disjointness, partition , exclusion.
25OntoMorph (I)
- Transformation system for symbolic knowledge.
- Facilitates
- Ontology merging.
- Rapid generation of KB translators.
- Provides 2 mechanisms
- Syntactic rewriting via pattern-directed rewrite
rules. - Semantic rewriting that modulates
- syntactic rewriting via semantic models.
- logical inference via an integrated KR system.
- OntoMorph architecture facilitates incremental
development and scripted replay of transforms.
26OntoMorph (II)
- Focuses on aligning ontologies through 3 steps
- Design transforms to bring sources to mutual
agreement. - Editing sources to carry out the transforms.
- Taking the union of the morphed sources.
- Steps
- 2 is facilitated by transforming ontos in common
format. - 1 is less automatable and involves human
negotiation. -
- Language mismatches but not expressivity.
- Ontology level mismatches but not coverage of
model - Repeatability
- -
- Transforms are expressed manually.
- Merging is not dealt at all.
27Scalable Knowledge Composition
- Developed algebra for onto composition that
- Operates on directed label graphs like ontos.
- Each operator has input a graph of
semi-structured data and transforms it to a
graph.(composable) - Operations are knowledge driven by using
articulation rules that are - Logical rules (semantic implication between
terms) - Functional rules (conversion between terms across
ontos) - Intersection op produces articulation onto that
contains terms that are related and their
relations. -
- Solves conceptual and terminological mismatches.
- Rules are expressed by engineer and lexical
knowledge. - Repeatability.
- -
- Most rules specified manually.
- No support for merging.
28Chimaera (I)
- Chimaera is onto merging and diagnosis tool.
- Supports ontology browsing and editing.
- It is targeted at lightweight ontologies.
- Supports 2 merging tasks
- Joins two similar terms under the same name.
- Identifies terms that should be related by
subsumption, disjointness or instance relations
and provides support for the introduction of
these relations. - Chimaera also generates by heuristics
- Name resolution lists for related terms.
- Taxonomy resolution lists where it suggests
taxonomy areas for reorganization.
29Chimaera (II)
- Has diagnostic support for
- Verifying
- Validating
- Critiquing ontologies.
-
- Solves mismatches at terminological and scope of
concept level. - Helps alignment by providing possible edit
points. - Diagnosis of the merging process
- -
- Not automatic everything requires user
interaction. - No repeatability.
- Use of local context for edit points.
30Prompt
- Prompt is interactive ontology-merging tool.
- Guides the user by
- Making suggestions based on linguistic-similarity
matches and syntactic clues. - By detecting conflicts of one realization of a
suggestion. - By proposing conflict resolution strategies.
- For every op it populates 3 sets
- Changes performed automatically.
- New suggestions for the user.
- Conflicts introduced like name conflicts,
dangling references, redundancy in
class-hierarchy and inconsistencies. - Prompt points to places requiring change and for
every place it proposes new actions. - Adv disadv same as Chimaera but supports
repeatability.
31FCA-Merge (I)
- FCA-Merge
- A bottom-up approach for ontology-merging
- Offers a global structural desc of the merge
process - Its mechanism based on instances of 2 ontos.
- The merge process contains 3 steps
- Instance extraction by natural language
techniques and computation of 2 formal contexts
based on extracted instances. - Derivation of a common context and computation of
pruned concept lattice by math techniques of FCA. - Generation of merged-ontology based on concept
lattice with the help of engineer and OntoEdit
32FCA-Merge (II)
- Restrictions
- Input documents should be domain-dependent.
- Each doc should cover all concepts from source
ontos. - Each doc must separate the concepts well enough
gt if concepts not separated rightly by the
method, the engineer should provide more and
better docs. - s and s
- Terminological and scope of concepts mismatches.
- Finding alignments with the help of the lattice.
- Diagnosis of results by using OntoEdit.
- Repeatability by storing the pruned concept
lattice.
33GLUE (I)
- Applies machine learning techniques for
alignment. - 3 main points
- Computation of joint probability distribution of
every concepts involved. In this way - Any similarity measure can be computed with JBD.
- Approach applicable to broad range of
ontology-matching problems. - Multi-strategy learning for computing JBD. In
this way - Many types of info can be used to maximize the
matching accuracy. - System extensible to new learners.
- Exploits domain restrictions and general
heuristics for maximizing matching accuracy by
using relaxation labeling. - Process compose of 3 main steps performed by the
automatable components Distribution Estimator,
Similarity Estimator and Relaxation Labeler.
34GLUE (II)
- Restrictions
- Only 1-1 mapping of concepts.
- Nodes not matched cause insufficient training
data. - Implementation of base learners resulted in
single general-purpose text classification. - Nodes not matched cause they are ambiguous. User
interaction is needed in this way. - Some pair of nodes should not be examined at all.
- s and s
- Local scope of concepts and proper
classification. - Finding alignments and repeatability automatic.
- Different encoding is solved by adding
appropriate learner.
35Anchor-Prompt (I)
- Has input a pair of similar pairs provided by
user or by heuristics. - Its algorithm analyzes the paths in the onto
sub-graph and determined which classes frequently
appear in similar positions. - Extends the approaches used in Prompt.
- It is implemented upon OKBC protocol.
- It finds only 1-1 mappings between concepts
36Anchor-Prompt (II)
- Limitations
- Very long paths dont produce accurate results.
- Path-length0 (Chimaera), Path-length1 (Prompt).
- Incidental matches can be produced (simil limit).
- When comparing a deep ontology with many slots
and a shallow ontology that has slot relating top
classes, then results are same with Prompt. - s
- Concept scope mismatches are dealt with.
- Finding alignments and repeatability are
automatic tasks.
37SHOE
- An HTML-based ontology language.
- Provides a rule mechanism for alignment
- Common items are mapped by inference rules.
- Terminological diffs are mapped by if-and-only-if
rules. - Scope diffs require mapping of categories where
the one subsumes the other. - Encoding diffs handled by mapping individual
values. - Provides version numbers to ontologies and
facilitates both identification of the revisions
and explicit specification of its relation to
other revisions (change-tracking).
38Conclusions (I)
- Discovered 4 different approaches that handle
interoperability at the language level - Aligning the meta-model.
- Layered interoperability.
- Transformation rules.
- Mapping onto a common knowledge model.
- We found tools that suggest alignments and
mappings with the use of heuristics. There are
two types of heuristics - Linguistic based-matches (FCA-Merge).
- Structural and model similarity (Chimaera and
Prompt).
39Conclusions (II)
- We found tools that semi-automate or
fully-automate the merging process but having
only 1-1 mappings of concept using different
techniques - Computation of pruned concept lattice
(FCA-Merge). Linguistic and FCA techniques. - Machine learning techniques (GLUE).
- Using global instead of local context
(Anchor-Prompt). - Interoperability at the model can be achieved by
a common top level ontology. Conform to a common
standard.
40Conclusions (III)
- Different approaches for diagnosing or checking
the results of assignments - Domain independent verification and validation
checks name conflicts, dangling references etc. - Validation that requires reasoning redundancy at
the class hierarchy, value restrictions violated
etc. - Several tools support an executable specification
of mappings and transforms (SKC,OntoMorph,Prompt,F
CA-Merge,GLUE,Anchor-Prompt). - Most techniques and tools dont deal versioning.