Title: Muse:%20A%20System%20for%20Understanding%20and%20Designing%20Mappings
1Muse A System for Understanding and Designing
Mappings
Bogdan Alexe Laura Chiticariu Renée J.
Miller Daniel Pepper Wang-Chiew Tan
UC Santa Cruz U. of
Toronto UC Santa Cruz
Motivation
Muse Overview
- Muse is a mapping design wizard that uses data
examples to help designers understand, design and
refine schema mappings -
- In Muse, the designer works with data examples
rather than with complex specifications to
understand the semantics of a mapping - Muse uses real data examples whenever possible,
otherwise it constructs synthetic examples -
- Muse consists of two components Muse-G (design
of desired nesting semantics for mappings) and
Muse-D (choosing the desired interpretation of
ambiguous mappings)
- Schema mapping relationship between a source
database schema and a target database schema - Designing a schema mapping is a fundamental
problem in information integration - Specifying a semantically correct schema mapping
is usually a complex task - Automatic tools can suggest potential mappings
- Ensuring mapping correctness still requires
intricate manual work - Few tools are available for helping a designer
understand and design alternative mappings
Designing Nesting Semantics with Muse-G
- Nesting semantics are expressed through grouping
functions, which are defined for each nested set
in the target schema - A grouping function is a form of Skolem
function, with atomic attributes as parameters - Example grouping function from mapping m2
- SKProjs(ltall attributes of c, p and e
gt) target Project records are grouped
according to the values of all attributes of the
Company, Project and Employee source records
Step 2 Probing on the cname attribute
OrgDB Rcd Orgs Set of Org Rcd
oname Projects Set
of Project Rcd
pname
manager Employees Set of
Employee Rcd eid
ename
CompDB Rcd Companies Set of
Company Rcd cid
cname location Projects Set
of Project Rcd pid
pname cid
manager Employees Set of
Employee Rcd eid
ename contact
Scenario 2 OrgDB Orgs IBM
ProjectsSK(y) DB e4 WiFi
e6 SBC ProjectsSK(y) DB
e4 WiFi e6 Employees e4 John
e6 Kat
Example source Companies 11 IBM NY 14
SBC NY Projects P1 DB 11 e4 P4 WiFi 14
e6 Employees e4 John x234 e6 Kat x331
Target instances Scenario 1 OrgDB Orgs
IBM ProjectsSK(IBM,y) DB e4
SBC ProjectsSK(SBC,y) WiFi e6
Employees e4 John e6 Kat
f1
- Example Designing the grouping function for the
target Projects set - Suppose the set of possible arguments is S
cid, cname, location - Muse-G probes every attribute in S
- At each probe, a small carefully chosen source
instance is considered, from which two
differentiating target instances are obtained
one includes the probed attribute in the grouping
function (Scenario 1 below), and the other omits
it (Scenario 2 below).
f2
y subset of NY
The designer chooses scenario 1 (includes cname
in the grouping function)
Step 3 Probing on the location attribute
m1 for c in CompDB.Companies exists o in
OrgDB.Orgs where c.cnameo.oname and
o.Projects SKProjs(c.cid,c.cname,c.location)
m2 for c in CompDB.Companies, p in
CompDB.Projects, e in CompDB.Employees
satisfy p.cidc.cid and e.eidp.manager
exists o in OrgDB.Orgs, p1 in o.Projects, e1 in
OrgDB.Employees satisfy
p1.managere1.eid where c.cnameo.oname
and e.eide1.eid and e.enamee1.ename
and p.pnamep1.pname and
o.Projects SKProjs(ltall attributes of c, p and
e gt) m3 for e in CompDB.Employees exists e1 in
OrgDB.Employees where e.eid e1.eid and
e.enamee1.ename
Example source Companies 11 IBM NY 13 IBM
SF Projects P1 DB 11 e4 P2 Web 13
e5 Employees e4 John x234 e5 Anna x888
Target instances Scenario 1 OrgDB Orgs
IBM ProjectsSK(IBM,NY) DB e4
IBM ProjectsSK(IBM,SF) Web e5
Employees e4 John e5 Anna
Scenario 2 OrgDB Orgs IBM
ProjectsSK(IBM) DB e4 Web
e5 Employees e4 John e5 Anna
The designer chooses scenario 2 (excludes
location from the grouping function)
Conclusion the desired grouping function for
Projects is SK(cname)
Extensions
Choosing Desired Mapping Interpretation with
Muse-D
CompDB Rcd Projects Set of
Project Rcd pid
pname manager
tech-lead Employees Set of
Employee Rcd eid
ename contact
OrgDB Rcd Projects Set of
Project Rcd pname
supervisor email
- The mapping scenario on the left is ambiguous
it can be interpreted in several ways - e.g. the project supervisor can be either the
manager or the tech-lead - In total, there are four alternative
interpretations - Key idea of Muse-D provide an example source
instance to illustrate the four interpretations
in a compact way
- Muse-G can take advantage of constraints on the
source schema (such as keys, and more generally,
functional dependencies) - The designer can refine the desired nesting
semantics incrementally
Example source Projects P1 DB e4
e5 Employees e4 John john_at_ibm e5 Anna
anna_at_ibm
Target instance Orgs Projects DB
John john_at_ibm Anna anna_at_ibm
Ambiguous mapping
ma for p in CompDB.Projects, e1 in
CompDB.Employees, e2 in
CompDB.Employees satisfy
e1.eidp.manager and
e2.eidp.tech-lead exists p1 in
OrgDB.Projects where p.pnamep1.pname and
(e1.enamep1.supervisor
or e2.enamep1.superviso
r) and
(e1.contactp1.email
or e2.contactp1.email)
Choice values for supervisor and email (the
designer makes one selection for each attribute)