CSE 636 Data Integration - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 636 Data Integration

Description:

Get the answers from the sources, filter them if needed and combine them ... Closed world assumption vs. open world? Virtual Integration Approaches. Logic ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 29
Provided by: michailpe
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 636 Data Integration


1
CSE 636Data Integration
  • Data Integration Approaches

2
Virtual Integration Architecture
  • Leave the data in the sources
  • When a query comes in
  • Determine the relevant sources to the query
  • Break down the query into sub-queries for the
    sources
  • Get the answers from the sources, filter them if
    needed and combine them appropriately
  • Data is fresh
  • Otherwise known as
  • On Demand Integration

3
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
Query
Result
End User
Mediation Language
Optimization Execution
Mediator
Global Schema
Web Services
XML
1
Data Source
Data Source
Local Schema
Local Schema
4
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
Query
Result
End User
Mediation Language
Optimization Execution
Mediator
Global Schema
Web Services
2
XML
1
Data Source
Data Source
Local Schema
Local Schema
5
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
Query
Result
End User
Mediation Language
Optimization Execution
3
Mediator
Global Schema
Web Services
2
XML
1
Data Source
Data Source
Local Schema
Local Schema
6
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
4
Query
Result
End User
Mediation Language
Optimization Execution
3
Mediator
Global Schema
Web Services
2
XML
1
Data Source
Data Source
Local Schema
Local Schema
7
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
4
Query
Result
5
End User
Mediation Language
Optimization Execution
3
Mediator
Global Schema
Web Services
2
XML
1
Data Source
Data Source
Local Schema
Local Schema
8
Virtual Integration Architecture
Design-Time
Run-Time
?
Mapping Tool
Query Reformulation
4
Query
Result
5
End User
Mediation Language
Optimization Execution
3
6
Mediator
Global Schema
Web Services
2
XML
1
Wrapper
Wrapper
Data Source
Data Source
Local Schema
Local Schema
9
Virtual Integration Approaches
  • Dimensions to Consider
  • How many sources are we accessing?
  • How autonomous are they?
  • Meta-data about sources?
  • Is the data structured?
  • Queries or also updates?
  • Requirements accuracy, completeness,
    performance, handling inconsistencies.
  • Closed world assumption vs. open world?

10
Mediation Languages
Global Schema
CD ASIN Title Genre
Artist ASIN Name
Logic
Authors ISBN FirstName LastName
Books Title ISBN Price DiscountPrice Edition
CDs Album ASIN Price DiscountPrice Studio
Artists ASIN ArtistName GroupName
BookCategories ISBN Category
CDCategories ASIN Category
11
Desiderata from Source Descriptions
  • Expressive power distinguish between sources
    with closely related data. Hence, be able to
    prune access to irrelevant sources.
  • Easy addition make it easy to add new data
    sources.
  • Reformulation be able to reformulate a user
    query into a query on the sources efficiently and
    effectively.

12
Reformulation Problem
  • Given
  • A query Q posed over the global schema
  • Descriptions of the data sources
  • Find
  • A query Q over the data source relations, such
    that
  • Q provides only correct answers to Q, and
  • Q provides all possible answers from to Q given
    the sources.

13
Languages for Schema Mapping
Mediator
Global Schema
Mediated Schema
GAV
GLAV
LAV
Source
Source
Source
Source
Source
Local Schema
Local Schema
Local Schema
Local Schema
Local Schema
14
Global-as-View (GAV)
Global Schema Movie(title, dir, year,
genre) Schedule(cinema, title,
time) Integrating View Create View Movie AS
SELECT FROM S1 S1(title,dir,year,genr
e) union SELECT FROM S2
S2(title,dir,year,genre) union SELECT
S3.title, S3.dir, S4.year, S4.genre FROM S3,
S4 S3(title,dir), WHERE
S3.title S4.title S4(title,year,genre)
15
Global-as-View Example 2
Global Schema Movie(title, dir, year,
genre) Schedule(cinema, title,
time) Integrating View Create View Movie AS
SELECT title, dir, year, NULL FROM S1
S1(title,dir,year) union SELECT title,
dir, NULL, genre FROM S2
S2(title,dir,genre)
16
Global-as-View Example 3
Global Schema Movie(title, dir, year,
genre) Schedule(cinema, title,
time) Integrating Views Create View Movie AS
SELECT NULL, NULL, NULL, genre FROM S4
S4(cinema, genre) Create View Schedule AS
SELECT cinema, NULL, NULL FROM S4
S4(cinema, genre) But what if we want to find
which cinemas are playing comedies?
17
Global-as-View Summary
  • Query reformulation boils down to view unfolding.
  • Very easy conceptually.
  • Can build hierarchies of global schemas.
  • You sometimes loose information. Not always
    natural.
  • Adding sources is hard. Need to consider all
    other sources that are available.

18
Local-as-View (LAV)
Create View R1 AS SELECT B.ISBN, B.Title,
A.Name FROM Book B, Author A WHERE A.ISBN
B.ISBN AND B.Year lt 1970
Create View R5 AS SELECT B.ISBN, B.Title FROM
Book B WHERE B.Genre Humor
Mediator
Global Schema
Book ISBN Title Genre Year
Author ISBN Name
Books before 1970
Humor Books
Mediated Schema
Source 1
Source 2
Source 3
Source 4
Source 5
Local Schema
Local Schema
Local Schema
Local Schema
Local Schema
R5 ISBN Title
R1 ISBN Title Name
19
Query Reformulation
Query Find authors of humor books
Plan R1 Join R5
Mediator
Global Schema
Book ISBN Title Genre Year
Author ISBN Name
Books before 1970
Humor Books
Mediated Schema
Source 1
Source 2
Source 3
Source 4
Source 5
Local Schema
Local Schema
Local Schema
Local Schema
Local Schema
R5 ISBN Title
R1 ISBN Title Name
20
Query Reformulation
Query Find authors of humor books before 1960
Plan Cant do it!
Mediator
Global Schema
Book ISBN Title Genre Year
Author ISBN Name
Books before 1970
Humor Books
Mediated Schema
Source 1
Source 2
Source 3
Source 4
Source 5
Local Schema
Local Schema
Local Schema
Local Schema
Local Schema
R5 ISBN Title
R1 ISBN Title Name
21
Local-as-View Example 1
Global Schema Movie(title, dir, year,
genre) Schedule(cinema, title, time) Source
Views Create Source S1 AS S1(title, dir,
year, genre) SELECT FROM Movie Create
Source S3 AS S3(title, dir) SELECT
title, dir FROM Movie Create Source S5 AS
S5(title, dir, year) SELECT title, dir,
year FROM Movie WHERE year gt 1960 AND
genreComedy
22
Local-as-View Example 2
Global Schema Movie(title, dir, year,
genre) Schedule(cinema, title, time) Source
Views Create Source S4 S4(cinema, genre)
SELECT cinema, genre FROM Movie M, Schedule
S WHERE M.titleS.title Now if we want to
find which cinemas are playing comedies, there
is hope!
23
Local-as-View Summary
  • Very flexible. You have the power of the entire
    query language to define the contents of the
    source.
  • Hence, can easily distinguish between contents of
    closely related sources.
  • Adding sources is easy theyre independent of
    each other.
  • Query reformulation answering queries using
    views!

24
The General Problem
  • Given a set of views V1,,Vn, and a query Q, can
    we answer Q using only the answers to V1,,Vn?
  • Many, many papers on this problem
  • The best performing algorithm The MiniCon
    Algorithm (Pottinger Halevy, VLDB 2000)

25
Local Completeness Information
  • If sources are incomplete, we need to look at
    each one of them.
  • Often, sources are locally complete.
  • Movie(title, director, year) complete for years
    after 1960, or for American directors.
  • Question given a set of local completeness
    statements, is a query Q a complete answer to Q?

26
Example
  • Movie(title, director, year)
  • complete after 1960
  • Show(title, theater, city, hour)
  • Query find movies (and directors) playing in
    Seattle
  • SELECT M.title, M.director
  • FROM Movie M, Show S
  • WHERE M.titleS.title
  • AND citySeattle
  • Complete or not?

27
Example 2
  • Movie(title, director, year), Oscar(title, year)
  • Query find directors whose movies won Oscars
    after 1965
  • SELECT M.director
  • FROM Movie M, Oscar O
  • WHERE M.titleO.title
  • AND M.yearO.year
  • AND O.year gt 1965
  • Complete or not?

28
References
  • Information integration
  • Maurizio Lenzerini
  • Eighteenth International Joint Conference on
    Artificial Intelligence, IJCAI 2003
  • Invited Tutorial
  • Data Integration a Status Report
  • Alon Halevy
  • German Database Conference (BTW), 2003
  • Invited Talk
Write a Comment
User Comments (0)
About PowerShow.com