Implementing Mapping Composition - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing Mapping Composition

Description:

Implementing Mapping Composition Todd J. Green* University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 25
Provided by: ucd98
Category:

less

Transcript and Presenter's Notes

Title: Implementing Mapping Composition


1
Implementing Mapping Composition
  • Todd J. Green
  • University of Pennsylania
  • with Philip A. Bernstein (Microsoft Research),
  • Sergey Melnik (Microsoft Research),
  • Alan Nash (UC San Diego)
  • VLDB 2006 Seoul, Korea
  • Work partially supported by NSF grants
    IIS0513778 and IIS0415810

2
Schema mappings
  • Mapping a correspondence between instances of
    different schemas

Names SID, Name
Students Name, Address
m
Addresses SID, Address
S1
S2
Students ? ?Name,Address (Names ? Addresses)
3
Applications of mappings
  • Names ? Names?
  • sCountry KR(Addresses) ? ?SID,Address(Local)KR
  • sCountry ? KR(Addresses) ? Foreign
  • Schema evolution

Students ? ?Name,Address,Country(Names ?
Addresses)
...
m12
m23
S3
S2
S1
4
Applications of mappings
  • Data integration, data exchange

Sn
Addresses SID, Address, Country
Names SID, Name
...
m1
mn
Students ? ?Name,Address (Names ?
Addresses)
Names? ? Names Local ? ?SID,Address(?Country
KR(Addresses)) Foreign ? ?Country ? KR(Addresses)
S1
Sn-1
Foreign SID, Address, Country
Students Name, Address, Country
Names? SID, Name
Local SID, Address
...
5
Requirements for constraints
  • First attribute in R is a key for R
  • ?2,4(R ?13 R) µ ?2,2(R)
  • View V equals R joined with S
  • V µ R ? S, V R ? S
  • Second attribute of R is a foreign key in S
  • ?2(R) µ ?1(S)
  • ?2,4(S ?13 S) µ ?2,2(S)
  • Data integration, data exchange GLAV
  • R ? S µ T ? U

6
Mapping composition
S1
S3
7
Composition is hard
  • Hard part write composition in the same language
    as the input mappings. Depending on language
  • Not always possible
  • Not even decidable whether possible
  • Strategy 1 use powerful (second-order) mapping
    language closed under composition FKPT04
  • Not supported by DBMS today
  • Expensive to check
  • Source-target restriction
  • Strategy 2 settle for partial solutions NBM05
  • Containment mappings ? easier integration with
    DBMS
  • The strategy we adopt in this work

8
Our contributions
  • New algorithm for composition problem
  • Incorporates view unfolding and left-composition
    (new technique)
  • Makes best effort in failure cases
  • Algebraic rather than logic-based mappings
  • Use of monotonicity to handle more operators
  • Modular and extensible factoring of algorithm
  • First implementation of composition
  • Experimental evaluation

9
Formal definition of composition
  • Mapping set of pairs of instances of db schemas
  • The composition m12 m23 is the mapping
  • hA,Ci (9B)(hA,Bi 2 m12 and hB,Ci 2 m23)
  • where A,B,C are instances of S1,S2,S3
  • Composition problem find constraints in same
    language as input mappings giving the composition
    of the input mappings
  • Example

S1 R, S2 S,T, S3 U,V,W R ? S?T, S ?
?(U), T V W
) R ? ?(U)?(V - W)
10
Best-effort composition problem
  • Composition not always possible
  • Best-effort composition problem compute set of
    constraints equivalent to input constraints, but
    with as many symbols from S2 eliminated as
    possible
  • R ? U, R ? V,
  • ?1,4(?23(U?U)) ? U, ?1,4(?23(V?V)) ? V,
  • U ? T, V ? T
  • Can eliminate U (cross out left column) or V
    (right column), but not both NBM05

11
Composition algorithm overview
  • For each relation R in S2
  • Try to eliminate R via (1) view unfolding
  • Replace by pairs of ?, ?
  • For each relation R in S2 not yet eliminated
  • Try to eliminate R via (2) left compose
  • Else, try to eliminate R via (3) right compose
  • Output
  • New constraints and list of relations
    successfully eliminated

12
(1) View unfolding
  • Idea exploit equality constraints (if we have
    any)
  • Standard technique substitute view definition
    for occurrences of view relation in mappings
  • T V W, R ? S ?T, T ? X ? ?(U)
  • ? R ? S ?(V W), (V W) ? X ? ?(U)
  • Body must not mention view relation itself
  • Doesnt matter what else is in body
  • Can substitute everywhere

13
(2) Left compose
  • View unfolding for containment constraints
  • ?(V) ? R U, R ? S ? T
  • ? ?(V) ? (S ? T) U
  • Needs monotonicity of expressions in R.
  • E1 ? E2(R), R ? E3 E1 ? E2(E3)
  • if E2(R) is monotone in R (and R not in E3)
  • Partial check for monotonicity
  • Is S (T R) monotone in R?

14
Normalization for left compose
  • Need one constraint of form R ? E1
  • Use identities to normalize, e.g.
  • R ? E1 and R ? E2 iff R ? E1 ? E2
  • E1 ? E2 ? E3 iff E1 ? E3 and E2 ? E3
  • ?(E1) ? E2 iff E1 ? E2 ? Dr
  • More identities in paper
  • After left compose, try to eliminate D

15
(3) Right compose
  • Dual to left compose, from NBM05
  • Example
  • S ?T ? R, R U ???(V)
  • ? (S ?T) U ? ?(V)
  • Monotonicity check needed here too
  • Normalization may introduce Skolem functions
  • E1 ? ?(E2) iff f(E1) ? E2
  • Must eliminate Skolem functions after composition
  • Lots of effort coding this step!

16
User-defined operators
  • User specifies
  • Monotonicity of operator in its arguments
  • If E1 monotone in R and E2 antimonotone in R or
    independent of R, then E1 E2 monotone in R
  • if E1 monotone in R or independent of R and E2
    antimonotone in R, then E1 E2 monotone in R
  • Identities for normalization
  • E1 E2 ? E3 iff E1 ? E2 ? E3
  • User-defined operators and standard relational
    operators treated uniformly

17
Implementation
  • 12K lines of C code, command-line tool
  • Test case 13 PODS05 example 2
  • SCHEMA
  • R(2), S(2), T(2)
  • CONSTRAINTS
  • R lt S,
  • P_0,2 J_0,11,2 (S S) lt R,
  • S lt T
  • ELIMINATE
  • S
  • Output
  • P_0,2 J_0,11,2(R R) lt R,
  • R lt T

18
Experimental evaluation
  • First attempt at a composition benchmark
  • Schema editing and schema reconciliation
    scenarios
  • Add a column to R to produce S ?(R) S
  • Measure
  • of symbols eliminated
  • Running time
  • As a function of
  • Editing primitives allowed, length of edit
    sequence, presence/absence of keys, starting
    schema size,
  • Synthetic data

19
Summary of results
  • Algorithm often effective in eliminating most or
    even all relation symbols from S2
  • Running time in subsecond range even for large
    problems containing hundreds of constraints
  • Certain schema editing primitives problematic
  • Key constraints did not reduce effectiveness,
    although did increase running time (and output
    size)

20
Schema editing
  • Random starting schema (30 relations of 2-10
    attributes)
  • 100 random edits
  • 100 different runs, sorted by execution time

21
Schema reconciliation (1)
  • Random schema (30 relations of 2-10 attributes),
    random edits
  • Point represents median time of reconciliation
    step of 500 runs

22
Schema reconciliation (2)
  • Random schema (variable relations of 2-10
    attributes)
  • 100 random edits
  • 100 different runs, sorted by execution time

23
Related work
  • MH03 J. Madhavan, A. Y. Halevy. Composing
    mappings among data sources. VLDB, 2003.
  • FKPT04 R. Fagin, Ph. G. Kolaitis, L. Popa, W.C.
    Tan. Composing schema mappings second-order
    dependencies to the rescue. PODS, 2004.
  • NBM05 A. Nash, P. A. Bernstein, S. Melnik.
    Composition of mappings given by embedded
    dependencies. PODS, 2005.

24
Conclusion and future work
  • We motivated and described the mapping
    composition problem
  • We presented an implementation of a practical new
    algorithm for the composition problem
  • We also presented an experimental evaluation
  • To do theoretical analysis of impact of
    user-defined operators
  • To do output constraints from algorithm can be a
    mess! How to clean up?
Write a Comment
User Comments (0)
About PowerShow.com