Title: Semantic Adaptation of Schema Mappings when Schemas Evolve
1Semantic Adaptation of Schema Mappings when
Schemas Evolve
- Cong Yu
- University of Michigan
- Lucian Popa
- IBM Almaden Research Center
VLDB05, Trondheim, Norway Sep 2, 2005
2Schema Mappings
?
Schema S
Schema T
J
I
q
q
- Schema Mappings are logical, declarative,
assertions that can describe relationships
between schemas. - enough semantics to guide run-time,
instance-level, transformation - e.g., GLAV mappings (or tuple-generating
dependencies) - They are key elements in two main areas in
information integration - Data Exchange/Translation
- Query Answering/Rewriting (or Federation)
3Schema Evolution and Mapping Adaptation
- Schemas evolve over time Mappings may become
invalid ! - A lot of effort goes into establishing mappings.
How do we reuse them ? - Mapping Adaptation Problem VMP03
- Given
- mapping M from S to T,
- changes/evolution of S to S, or T to T, or
both, - Derive a best mapping M that
- is valid with respect to the new schemas, and
- reflects the original mapping as much as possible
4Prior Solution Incremental Method
M
move elem
S
T
M1
add elem
T1
M2
T2
delete constraint
M3
T3
rename elem
- VMP03 Incrementally adapts the mapping after
each atomic change in the schemas (source and/or
target). - Efficient and intuitive, for one or few changes.
- However, for non-incremental evolution, there are
drawbacks
5M
T
S
Different evolution paths
Mn
Tn
- The new schema may be radically different
- The list of changes may not be known.
- Evolution path must be discovered ? not
necessarily unique - The method will ultimately be inefficient
- The algorithm must be applied at each atomic
change - As we shall see, the resulting mapping may not be
the expected one.
6Our Approach Composition-Based
M
S
Can use schema mapping tools (e.g., Clio) to
construct E.
T
E
M M E
T
- Evolution itself is described as a schema
mapping. - Concise, declarative, and expressive description
of evolution. - Enables efficiency and can deal with arbitrary
evolution - The adapted mapping is then obtained via
composition. - Formal semantics of adaptation.
- At high level, this is part of the model
management vision Ber03.
7Main Contributions
- We study the interplay between schema evolution
and mapping composition - interesting in terms of both semantics and
implementation - We show that the composition-based approach for
mapping adaptation can be practical
8Outline of the Rest of the Talk
- Incremental Approach vs. Composition Approach
- Example (showing why composition is important)
- Composition Semantics and Algorithm
- Transformation semantics
- ? specialized, more suitable for schema
evolution, also more challenging - Optimization and Experiments
- Compose only when necessary
- (Some mapping formulas are unaffected by the
change) - Conclusion
9Simplified Example
Source
Source
Target
- m SuppPart (s, p) ? PartOrder (p, o)
- ? PotentialSupp (s, o)
- ( GLAV mapping Halevy01, or,
- source-to-target tgd FKMP03 )
SuppPart
LineItem
m
s p
PotentialSupp
li s p o qty
s o
PartOrder
p o
- The mapping m exports orders o and all their
potential suppliers s. - Schema evolution scenario
- Data arrives in long tuples, each relating an
order, a part and an available supplier. -
- The mapping m must be adapted to use new schema
Source.
10Incremental Approach VMP03
Source
Target
Source
m SuppPart (s, p) ? PartOrder (p, o)
? PotentialSupp (s, o)
SuppPart
LineItem
m
s p
PotentialSupp
li s p o qty
s o
PartOrder
p o
- Pick a list of changes from Source to Source and
rewrite mapping after each change. - (1) Move element SuppPart/s to PartOrder/s
- SuppPart (p) ? PartOrder (s, p, o) ?
PotentialSupp (s, o) - (2) Delete SuppPart/p and (3) delete SuppPart.
- (4) Rename PartOrder to LineItem, (5) add
LineItem/li and (6) add LineItem/qty - m LineItem (li, s, p, o, qty) ?
PotentialSupp (s, o)
11- Although small, our example already needs 6
schema changes. - For large schemas, this can become challenging
- Furthermore, and somewhat surprisingly, the
semantics of the adapted mapping may not be the
expected one !
12Loss of Semantics
Source
Target
Source
m SuppPart (s, p) ? PartOrder (p, o)
? PotentialSupp (s, o)
SuppPart
LineItem
m
s p
PotentialSupp
li s p o qty
m LineItem (li, s, p, o, qty) ?
PotentialSupp (s, o)
s o
PartOrder
p o
- The original mapping m joins orders with
suppliers - However, m loses relevant suppliers
- It only pairs an order with a supplier provided
they appear in the same LineItem tuple - To retain the original semantics, we must look in
different tuples ! -
- m LineItem (li, s, p, o, qty) ?
LineItem (li, s, p, o, qty) - ?
PotentialSupp (s, o)
13- The incremental approach is a mechanical
procedure that makes local changes to the
mapping. - A sequence of good local changes may not
necessarily yield the best global adaptation
14Mapping Composition Approach
- We look at the evolution globally
- Describe evolution through a schema mapping
Source ? Source.
Source
Target
Source
SuppPart
LineItem
m
e1
s p
PotentialSupp
e1 LineItem (l, s, p, o, q) -gt SuppPart (s, p)
e2 LineItem (l, s, p, o, q) -gt PartOrder (p, o)
li s p o qty
s o
PartOrder
p o
e2
- Define the adapted mapping to be a mapping
Source ? Target, equivalent (e.g., same data
movement) to the sequence of the evolution
mapping and the original mapping. - The previous m satisfies the conditions for
e1,e2 and m.
15- The composition approach is a more systematic
approach, with precise semantics, guaranteed to
behave the right way in all situations. - Although it may appear simple in the previous
example, mapping composition poses challenges
16Challenges in Composition Approach
- Mapping language
- Must handle nesting and complex types (as in XML
Schema) - (details in the paper)
- Furthermore, the usual mapping languages (GLAV,
tuple-generating dependencies) are not closed
under composition ! - Recent extension that ensures composability
second-order tgds FKPT04. - Main idea add functions to gain needed
expressive power - Semantics and Algorithm
- Efficiency/Scalability
Next
17Composition Semantics and Algorithm
18Composition Semantics
- In mapping composition, we want to replace a
sequence of schema mappings with one that is
equivalent and avoids the middle schema. - What does equivalent mean ?
- There are two semantics that we considered
- Relationship semantics
- More general
- Transformation semantics
- More suitable, specialized
19Relationship Semantics
- Mappings can be viewed as describing
relationships between instances over the two
schemas - Rel (M12) (I1, I2) (I1, I2) satisfies M12
- Composition of relationships
- Rel (M12) ? Rel (M23) (I1, I3) there is
I2 such that - (I1,
I2) satisfies M12 and (I2, I3) satisfies M23 - FKPT04, Melnik04 A mapping M13 is equivalent,
to the sequence of M12 and M23, under the
relationship semantics, if - Rel (M13) Rel (M12) ? Rel (M23)
20Example Semantics and Algorithm
S3
S1
S2
Unknown student id (Skolem term)
Student
Second-order tgd FKPT04
E
Takes
sid name
M
Takes
sid name course
M Takes (n, c) ? Student (F(n), n)
? Enrolls (F(n), c) E Student (s,
n) ? Enrolls (s, c) ?
Takes (s, n, c)
name course
Enrolls
sid course
- M13 correctly captures the equivalent
relationship between instances of S1 and S3. - Instances (and function F) can exist a priori.
- A student n must be paired with a course c
- even when c is listed under a different student
name n, - provided the student id is the same F(n)
F(n)
1. Substitution
M13 Takes (n, c) ? Takes (n, c) ? F(n)
F(n) ? Takes (F(n), n, c)
21- However, if we assume that the function F is
one-to-one, an important simplification can be
made
M13 Takes (n, c) ? Takes (n, c) ?
Takes (F(n), n, c)
M13 Takes (n, c) ? Takes (n, c) ? F(n)
F(n) ? Takes (F(n), n, c)
2. Reduction
F(n) F(n) ? n n
3. Minimization
Equivalent relationship
M13 Takes (n, c) ? Takes (F(n), n,
c)
Equivalent transformation
- We can always make this assumption, if mappings
are meant to describe transformations (i.e.,
generation of a target instance). - F is a Skolem function assigning unique student
ids n ? F(n)
22Transformation Semantics
- A mapping is a process (in the spirit of data
exchange FKMP03) - I2 M12(I1)
- Each mapping formula is a generator of target
facts - Functions are one-to-one value generators
- Theorem. Our composition algorithm produces the
schema mapping with the equivalent transformation
semantics - M13 (I1) M23( M12(I1) ) (up to
the renaming of nulls) - Advantage of transformation semantics, in
adaptation - ? simpler and more intuitive formulas !
23Composition Algorithm Further Details
- The substitution step is more complex than shown
- Must handle nesting
- Generate parameterized rules for set types in the
middle schema - Reuse some of the mapping-based query rewriting
techniques YP04 - Minimization
- Good it simplifies formulas and generates
intuitive mapping. - (all this is enabled by the transformation
semantics) - Bad it can be expensive (same as tableau
minimization)
24Optimization and Experimental Results
25Full Adaptation
- Full adaptation ? Compose whole schema mappings
-
- (Compose all the formulas in the original
mapping with all the formulas in the evolution
mapping) - Inevitable when the schema evolution is drastic
and affects most of the original mapping
(non-incremental evolution) - Inefficient when the changes are small and
localized (incremental evolution)
26Compose Only When Necessary
- Mapping Pruning
- 1. Detect those parts (formulas) Mo of the
original mapping Mo that are affected by
evolution. - Only Mo need to be adapted.
-
- 2. Only a subset Me of the formulas in the
evolution mapping Me play a role in the
composition with Mo - The rest are redundant (PTIME containment-like
analysis, see paper) -
- 3. Compose Mo with Me
- ? Big performance gain for incremental evolution
and overall.
27Analysis of Evolution Scenarios
Results based on Clio
Benefits 1 adapted mappings / (blank-sheet
mappings missed mappings)
We also have synthetic scenarios that show
scalability of Mapping Pruning with increasing
schema and mapping complexity
28Conclusion
- We studied
- Mapping composition techniques for mapping
adaptation - Transformation semantics in the context of schema
evolution - Designed and implemented a practical adaptation
system - Mapping pruning (schema evolution specific)
- To Do
- Optimization of composition in general
- Improve performance of minimization