Title: Nessun titolo diapositiva
1MOMIS Query Manager Prototipo di un query
manager per la gestione di query globali
D. Beneventano, S. Bergamaschi, F.
Mandreoli Università degli Studi di Modena e
Reggio Emilia
D2I Integrazione, Warehousing e Mining di
sorgenti eterogenee Tema 1 Integrazione di dati
provenienti da sorgenti eterogenee ROMA, 11
OTTOBRE 2002
2Example
Local classes (relational)
L1(firstn,lastn,year,e_mail) L2(name,e_mail,dept_
code,s_code)
INTEGRATION
Global Class G
Name E_mail Section Year Dept
L1 firstn and lastn e_mail null year null
L2 name e_mail s_code null dept_code
Global Class Schema G
S(G) (Name,E_mail,Year,Dept,Section)
Local Class Schemata w.r.t. Global Class
S(L1) (Name,E_mail,Year) S(L2)
(Name,E_mail,Dept,Section)
3Data cleaning and reconciliation
- Integration at the extensional level
- the data returned by various sources need to be
converted/reconciled - interpretation and merging of the data provided
by the sources
- Schema Translation
- (example firstn and lastn to Name)
- Data conversion
- (example Rita Verde to Rita Verde)
name e_mail dept_c S_code
Rossi_Ada RA_at_i.it Dept1 413245
Po_Ugo UP_at_i.it Dept1 2314
firstn lastn e_mail year
Rita Verde PV_at_i.it 2
Ada Rossi RA_at_i.it 1
L1
L2
Name E_mail Year
Rita Verde PV_at_i.it 2
Ada Rossi RA_at_i.it 1
Name E_mail Dept Section
Ada Rossi RA_at_i.it Dept1 413245
Ugo Po UP_at_i.it Dept1 2314
4Redundancy and Reconcilation
Hypothesis
Instances of the same object in different local
class must have the same value for a common
attribute
L2
L1
O1
O
O2
L2
L1
Name E_mail Year
Rita Verde PV_at_i.it 2
Ada Rossi RA_at_i.it 1
Name E_mail Dept Section
Ada Rossi RA_at_i.it Dept1 413245
Ugo Po UP_at_i.it Dept1 2314
O1
O
O
O2
5Object fusion
To identify instances of the same object and fuse
them JoinMap - join criteria among classes
L2
L1
O1
O
O2
Name E_mail Year
Rita Verde PV_at_i.it 2
Ada Rossi RA_at_i.it 1
Name E_mail Dept Section
Ada Rossi RA_at_i.it Dept1 413245
Ugo Po UP_at_i.it Dept1 2314
O1
O
O
O2
JoinMap JM(L1,L2) L1.NameL2.Name
Name Name
Ada Rossi Ada Rossi
6Object fusion indirect map
L1
L2
O1
O2
O3
Id Name E_mail Year
123 Rita Verde PV_at_i.it 2
243 Ada Rossi RA_at_i.it 1
E_mail Dept SN
RA_at_i.it Dept1 XY413245
UP_at_i.it Dept1 XZ2314
O1
O2
O2
O3
JoinMap JMCS.S,UNI.RS
Matr SN
243 XY413245
7Global Class Instance
- GAV with Single database property
- (Lenzerini - Data Integration A Theoretical
Perspective, PODS 2002)
- The computation is based on FULL DISJUNCTION
- (Rajarama, Ullman - Integrating Information by
Outerjoins - and Full Disjunctions. PODS 1996)
- Computing the natural outerjoin of many
relations in a way that preserves all possible
connections amon facts
L1
Name E_mail Year
Rita Verde PV_at_i.it 2
Ada Rossi RA_at_i.it 1
L2
Name E_mail Dept Section
Ada Rossi RA_at_i.it Dept1 413245
Ugo Po UP_at_i.it Dept1 2314
G select S(G) from L1 outer join L2 on JM(L1,L2)
G
Name E_mail Year Dept Section
Ada Rossi RA_at_i.it 1 Dept1 413245
Rita Verde PV_at_i.it 2
Ugo Po UP_at_i.it Dept1 2314
8FULL DISJUNCTION COMPUTATION
- Question when a full disjunction can be
computed by some sequence of natural outerjoins
- Answer there is a natural outerjoin sequence
producing the full disjunction if and only if the
set of relation schemes forms a connected,
?-acyclic hypergraph (Fagin - 1983)
A Global class with n local classes, n gt2
?-cyclic hypergraph
L1
JM(L1,L3)
JM(L1,L2)
New Method
JM(L2,L3)
L3
L2
Example n 3
G select S(G) from (L1 outer join L2 on
JM(L1,L2)) outer join (L1 outer join L3 on
JM(L1,L3)) on JM(L2,L3)
9Query rewiting method
Global query (in DNF) Q1
Local query for the class L Q1_L
where-condition of Q1_L all factors of DNF
which can be solved in L
residual factors of Q1 factors not included in
all local where-condition
select-list of Q1_L attributes of the
select-list of Q1 residual factors JoinMap
Global query reformulation full disjunction
based on the JoinMap residual factors
10Query rewiting example
Global query
Q1 select E_mail from G where (E_mail like
.it' and Dept'Dept1') or (E_mail like .it'
and Year2)
Local queries
Q1_L1 select Name, Year, E_mail from L1
where (E_mail like .it' or Year2)
Q1_L2 select Name, Dept, E_mail from L2
where (E_mail like .it' or Dept'Dept1')
Global query reformulation
Q1 select E_mail from Q1_L1 outer join Q1_L2 on
JM where (Dept'Dept1' or Year2)
residual factor