Title: Composing Mappings between Schemas using a Reference Ontology
1Composing Mappings between Schemas using a
Reference Ontology
Eduard Dragut, Ramon Lawrence Iowa Database and
Emerging Applications (IDEA) Laboratory University
of Iowa eduard-dragut, ramon-lawrence_at_uiowa.edu
2Outline
- Motivation
- Integration Approach
- Background
- Architecture Overview
- Ontological Matching
- Composing Mappings
- Global View Construction
- Experimental Results
- Future Work and Conclusions
3Motivation
- Many organizations have pre-existing ontologies
that are not suitable as global views but are
suitable as reference ontologies to aid
integration. - Example National Cancer Institute (NCI) and
National Insitutes of Health (NIH) have caBIG
grid prototype which standardizes terminology
(EVS, caDSR) and data elements in cancer domain. - Schema-to-ontology matching requires integrators
understand only their schema instead of all
schemas that they may want to integrate.
4Integration Approach
Reference Ontology
Page 4
Composing Mappings between Schemas using a
Reference Ontology - ODBASE04 - Eduard Dragut,
Ramon Lawrence
5BackgroundOntologies and Integration
- Ontologies as the integrated, global view
- Carnot project (Collet91) with Cyc ontology
(Lenat90) - ONTOBROKER (Decker98), OBSERVER (Mena00)
- Tools for semi-automatically merging ontologies
- PROMPT (Noy00), Ontobuilder (Gal04)
- Use ontologies as matching/integration aids
- MOMIS (Beneventano03) using WordNet
- Indirect (Xu03), CUPID (Madhavan01), COMA (Do02)
- Matching ontologies (Doan02)
- Discovering ontologies (Madhavan03)
- Corpus-based matching
6BackgroundModel Management
- Model management as proposed by (Bernstein03) is
intended to allow high-level schema operations. - Operators include Invert, Compose, Match, Merge.
- Warning Semantics of all operators are not yet
fully defined and some of them are not completely
automatic. - Definitions
- A match is a semantic correspondence between
schema elements. - A mapping between schema elements is an
expression that relates the elements. - Note that most schema matching systems such as
COMA produce matches not mappings.
7Architecture Overview
- We assume the existence of a pre-existing
reference ontology that has been accepted in a
domain. - The ontology is NOT a global view and may not
cover the information in all schemas. It cannot
be edited. - Global view construction is a 3-step process
- 1) Independently match each schema to the
ontology. - 2) Compose schema-to-ontology matches to produce
schema-to-schema mappings. - 3) Merge the schema mappings to produce the
global view. - The challenge is to automate this as much as
possible.
8Benefits of Approach
- Even with manual integration there are several
benefits to using a reference ontology - 1) An integrator must only understand their
schema and the ontology and not other schemas to
be integrated. - 2) Most validation is performed once during
schema-to-ontology matching and not for every
schema integrated. - 3) Schema-to-ontology matchings can be re-used
every time a new schema is integrated into the
federation. - Automation can
- 1) Help construct schema-to-ontology matchings.
- 2) Perform composition of mappings.
- 3) Build a global view from the composed mappings.
9Automation Challenges
- There are several challenges in automating this
process - 1) Schema matching systems such as COMA are
designed for simpler relational schemas.
Ontologies must be mapped into a suitable format
for use with COMA. - 2) Schema-to-ontology matching is less accurate
due to more complicated ontological structure and
because the ontology may not model the entire
domain or may model it differently. - 3) Composing matchings often results in many
false matches which must be handled. - 4) A method for merging schemas using model
management primitive operators is required. - Even with these operators, Merge is not fully
automatic.
10BackgroundCOMA
- COMA (Do02) is a schema matching system that can
flexibly combine different match algorithms and
re-use match results. - Match algorithms use names, paths, and schema
properties in various ways. - The mapping format between two schemas R and S is
a triple (r,s,v) where r in R, s in S, and v is
the similarity value in 0..1 between elements r
and s. - A schema in COMA is represented as a rooted
directed acyclic graph. Schema elements are
nodes which may be connected by links of
different types.
11Ontological Matching
- The first step is to convert ontologies in
OWL/DAML format into COMAs graph representation
format. - Wrote a program that used the JENA parser.
- During the conversion
- 1) Explicitly converted a named relationship in
the ontology into a node and several edges in
graph. - 2) Explicitly encoded attributes inherited over
IS-A links since COMA does not support IS-A. - After conversion, COMA would automatically
produce a schema-to-ontology match as it would
appear to be matching two relational schemas.
12Converting Ontology to a Graph
13Ontological MatchingMax versus noMax
- One challenge is what should this match look
like? - Two choices
- 1) Max - For each schema element, keep the best
match with the ontology (if any). - 2) NoMax - For each schema element, keep all the
matches that are above the cutoff threshold. - Since Max only generates one match, it is
probably the best in semi-automated settings.
NoMax will generate many matches which must be
filtered out by the user or during composition.
14Composing Mappings
- Schema-to-ontology mappings must be composed to
produce direct schema-to-schema mappings. - Since mappings carry no semantics, two objects
are assumed to be identical if they map to the
same ontological concept. Composition is
performed transitively and is implemented using a
natural join. - That is, if element r is similar to o and o is
similar to s, then we assume that r is similar to
s. - For example
- ltpostalCode,Zip,0.8gt and ltZip, postCode,0.7gt can
be composed to yield ltpostalCode,postCode,0.75gt. - The similarity values may be combined using
various functions, although average is the most
common.
15Composition Example
16Global View Construction
- One of the possible applications of constructing
schema-to-schema mappings in this way is using
them to build a global view. - We have given a script in the paper that uses
model management operators to compose any number
of schema-to-ontology mappings into a single
global view for all sources. - Note that this algorithm is not perfect nor fully
automatic as the mappings are not perfect and the
Merge operator may require human intervention.
17Global View Construction Example
18Experimental Setup
- Matched the 5 sample order schemas CIDR, Excel,
Noris, Paragon, and Apertum used to evaluate
COMA. Numbered these schemas 1, 2, 3, 4, and 5. - Created a reference ontology that models some of
the domain (but not all of it) and is quite
different than the schemas (uses IS-A for
example). - Used the matchings specified with COMA as
ground-truth. - Evaluation metrics
- Precision - of correct matches/ of suggested
matches - Recall - of correct matches returned/ total
matches - Overall Recall (2 - 1 / Precision)
19Reference Order Ontology
20Experiment 1Schema-to-Ontology Matching
- Goal Evaluate the accuracy of schema-to-ontology
matching. - Method
- Automatically convert ontology into COMA format
and match each schema with ontology. - Evaluation
- Measured the percent overlap of the schema and
ontology. For many schemas, only 60 of their
concepts were in the ontology. - Evaluated the precision, recall, and overall
measures relative to the number of matches that
could be found. - E.g. If overlap was 60 and recall was 50, then
only 30 of all schema elements were matched BUT
of all the possible matches, 50 were found.
21Experiment 1 Results
noMax is poor for schema 5 as Buyer incorrectly
matched to ontology.
22Experiment 2Schema-to-Schema Mappings
- Goal Determine the accuracy of producing
schema-to-schema mappings by composing
schema-to-ontology matchings. - Method
- Used automatically generated schema-to-ontology
matchings and composed them. Evaluated
composition result against COMA answers for
direct matching. - Evaluated noMax and Max techniques and manual
mappings.
23Experiment 2 Results (Overall)
1 lt-gt 2 is poor because of Street mapping. 4
lt-gt 5 is poor because of Buyer mapping.
24Experiment 3Improving Direct Matches
- Goal Determine if the accuracy of producing
direct schema-to-schema mappings can be improved
by re-using schema-to-ontology matches. - Method
- Generate schema-to-schema mappings by composing
schema-to-ontology matchings and then use this as
past matching information for COMA. - Allow COMA to perform direct match given this
information. - Evaluated noMax and Max techniques and manual
mappings.
25Experiment 3 Results (Overall)
1 lt-gt 2 is poor because of Street mapping.
26Discussion and Conclusions
- Major findings
- 1) Schema-to-ontology mappings can be constructed
with good accuracy (70-80 precision, 60
recall). - 2) The composition of schema-to-ontology
matchings produces similar results to direct
matching with COMA. - 3) Max has higher precision than noMax but with
lower recall. Max is probably best when the user
must filter incorrect matches and always saves
work. - 4) It is valuable to re-use schema-to-ontology
matchings (either automatic or manually
constructed) to improve the accuracy of direct
matchings. - Major conclusion There is a benefit to building
semi-automatic schema-to-ontology matchings for
use in integration and global view construction.
27Future Work and Challenges
- The major challenge is that the mappings carry no
semantics which often results in incorrect
matches suggested after composition. - We are currently working on extending the
mappings to capture semantics to avoid many of
these cases. - The approach is not fully automatic (nor will it
ever be). However, most manual work is in the
schema-to-ontology matching stage. We need
better algorithms and tools to support this
matching. - Want to perform experimental evaluation on larger
ontologies such as those from NCI. - Issue Many ontologies are not in suitable form
for intermediate mapping with schemas. (just
taxonomies)
28Composing Mappings between Schemas using a
Reference Ontology
Eduard Dragut, Ramon Lawrence Iowa Database and
Emerging Applicatons (IDEA) Laboratory University
of Iowa eduard-dragut, ramon-lawrence_at_uiowa.edu
29Extra Slides
Extra Slides...
30Ontology Conversion Algorithm
- 1) Each ontology concept (class) becomes a node
in the graph. - 2) For each property (attribute) of a class, add
a node to the graph and connect it to its class. - 3) Non-basetype properties (those with domain and
range in ontology) are converted by - 3a) Creating a node in the graph for the
relationship. - 3b) Adding an edge from the class domain to this
node. - 3c) Adding an edge from the new node to the range
class. - Note Do not currently support properties that
have a domain or range that is union/intersection
of concepts. - 4) IS-A expanded by graph traversal.
31Mapping Composition Challenges
Composing N1 match with 1N match results in a
cross-product
Cannot handle these cases as mappings have no
semantics.
32Global View Construction Script
Computes Global View of N Source Schemas (with
ontology mappings)
Operator GlobalView(ArraySchemas, ArrayMappings,
O, n) // ArraySchemas stores the n schemas //
ArrayMappings stores the n schema-to-ontology
mappings 1. If n lt 0 Then Return empty
schema 2. If n 1 Then Return
ArraySchemas0 3. S1 ArraySchemas0 4. S2
ArraySchemas1 5. map1 ArrayMappings0 6.
map2 ArrayMappings1 7. lt S, map gt
GlobalView2(S1, S2, map1, map2, O) 8. For (i2
i lt n-1 i) 9. S1 S 10. map1
map 11. S2 ArraySchemasi 12. map2
ArrayMappingsi 13. lt S, map gt
GlobalView2(S1, S2, map1, map2, O) 14. end
for 15. Return lt S, map gt
33Global View Construction Script (2)
Computes Global View of Two Source Schemas (with
ontology mappings)
Operator GlobalView2(S1, S2, O, S1_O, S2_O) 1.
S1_S2 S1_O Invert(S2_O) 2. lt M, S1_M, S2_M gt
Merge(S1, S2, S1_S2) 3. M_O Invert(S1_M)
S1_O Invert(S2_M) S2_O 4. Return lt M, M_O gt
34Sample Order SchemaExcel XML Schema
lt?xml version"1.0"?gt ltSchema name"PurchaseOrder.
biz" xmlns"urnschemas-microsoft-comxml-data"
xmlnsdt"urnschemas-microsoft-comdatatypes"gt
ltElementType name"PurchaseOrder"
content"eltOnly"gt ltelement
type"Header"/gt ltelement type"Items"/gt lteleme
nt type"Footer"/gt ltelement type"InvoiceTo"/gt
ltelement type"DeliverTo"/gt lt/ElementTypegtltEleme
ntType name"Items" content"eltOnly"gt ltAttribut
eType name"itemCount" dttype"int"gtlt/AttributeTy
pegt ltattribute type"itemCount"/gt ltelement
type"Item" maxOccurs"" minOccurs"1"/gt lt/Eleme
ntTypegt ltElementType name"Item"
content"empty"gt ltAttributeType
name"yourPartNumber" dttype"string"gtlt/Attribute
Typegt ltAttributeType name"unitPrice"
dttype"number"gtlt/AttributeTypegt ltAttributeType
name"unitOfMeasure" dttype"string"gtlt/Attribute
Typegt ltAttributeType name"salesValue"
dttype"number"gtlt/AttributeTypegt ltAttributeType
name"quantity" dttype"number"gtlt/AttributeTypegt
ltAttributeType name"partNumber"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"partDescription" dttype"string"gtlt/Attribu
teTypegt ltAttributeType name"itemNumber"
dttype"int"gtlt/AttributeTypegt
35Sample Order SchemaExcel XML Schema (2)
ltattribute type"itemNumber"/gt ltattribute
type"yourPartNumber"/gt ltattribute
type"partNumber"/gt ltattribute
type"partDescription"/gt ltattribute
type"quantity"/gt ltattribute type"unitOfMeasure
"/gt ltattribute type"unitPrice"/gt ltattribute
type"salesValue"/gt lt/ElementTypegt ltElementType
name"InvoiceTo" content"eltOnly"gt ltelement
type"Contact"/gt ltelement type"Address"/gt lt/E
lementTypegt ltElementType name"Header"
content"eltOnly"gt ltAttributeType
name"yourAccountCode" dttype"string"gtlt/Attribut
eTypegt ltAttributeType name"ourAccountCode"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"orderNum" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"orderDate"
dttype"date"gtlt/AttributeTypegt ltattribute
type"orderNum"/gt ltattribute type"orderDate"/gt
ltattribute type"ourAccountCode"/gt ltattribute
type"yourAccountCode"/gt ltelement
type"Contact"/gt lt/ElementTypegt
36Sample Order SchemaExcel XML Schema (3)
ltElementType name"Footer" content"empty"gt
ltAttributeType name"totalValue"
dttype"number"gtlt/AttributeTypegt ltattribute
type"totalValue"/gt lt/ElementTypegt ltElementType
name"DeliverTo" content"eltOnly"gt ltelement
type"Contact"/gt ltelement type"Address"/gt lt/E
lementTypegt ltElementType name"Contact"
content"empty"gt ltAttributeType
name"telephone" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"e-mail"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"contactName" dttype"string"gtlt/AttributeTy
pegt ltAttributeType name"companyName"
dttype"string"gtlt/AttributeTypegt ltattribute
type"contactName"/gt ltattribute
type"companyName"/gt ltattribute
type"e-mail"/gt ltattribute type"telephone"/gt lt
/ElementTypegt
37Sample Order SchemaExcel XML Schema (4)
ltElementType name"Address" content"empty"gt
ltAttributeType name"street4"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"street3" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"street2"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"street1" dttype"string"gtlt/AttributeTypegt
ltAttributeType name"stateProvince"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"postalCode" dttype"string"gtlt/AttributeTyp
egt ltAttributeType name"country"
dttype"string"gtlt/AttributeTypegt ltAttributeType
name"city" dttype"string"gtlt/AttributeTypegt lt
attribute type"street1"/gt ltattribute
type"street2"/gt ltattribute type"street3"/gt lt
attribute type"street4"/gt ltattribute
type"city"/gt ltattribute type"stateProvince"/gt
ltattribute type"postalCode"/gt ltattribute
type"country"/gt lt/ElementTypegt lt/Schemagt
38Experiment 2 Precision
39Experiment 2 Recall
40Experiment 3 Results (Precision)
41Experiment 3 Results (Recall)