YoungKwang Nam - PowerPoint PPT Presentation

About This Presentation
Title:

YoungKwang Nam

Description:

Kweelt (University of Washington), a XML query engine based on Quilt, used in our prototype ... Uses Quilt as its XML query language. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 35
Provided by: nam119
Learn more at: https://cseweb.ucsd.edu
Category:
Tags: youngkwang | nam

less

Transcript and Presenter's Notes

Title: YoungKwang Nam


1
A Metadata Integration Assistant Generator for
Heterogeneous Databases
  • Young-Kwang Nam
  • Joseph Goguen
  • Guilian Wang

2
Data Integration in Synthetic Scientific
Applications
Applications
Integrated result without inconsistency, etc.
Query
global unified schema/ontology
Integration System
local schema/ontology
local schema/ontology
local schema/ontology

data source 1
data source 2
data source n
3
Why Difficult Data Heterogeneity
  • Platform System Heterogeneity
  • OS, Hardware
  • DBMSs, Concurrency control and recovery
    capabilities
  • Syntactic Structural Heterogeneity
  • Machine readable aspects of representation
  • Data models, Schemas,
  • Semantic Heterogeneity
  • Naming conflicts synonyms, homonyms
  • Scaling precision conflicts
  • Sampling rates, error distribution, etc.

4
More Difficult Flexible Integration
  • No all-encompassing system satisfies everyone
  • frequent update of sources
  • frequent change of user requirements
  • non-published data from ones own lab
  • Simplicity and readability are more desirable
    than completeness or exhaustiveness to domain
    scientists
  • Domain knowledge is crucial for
  • solving heterogeneities
  • query optimization
  • Desirable to support domain scientists to do data
    integration on their own

5
A Common Data Integration Architecture
Query
Result
An Integrated View Materialized or Virtual
Mediator
Wrapper
Wrapper
Wrapper
data source 1
data source 2
data source n

6
Structural vs. Semanticwrt Mediation Level
  • Structural approach (Mediated schema approach)
  • integration by generating mediated schema that
    characterize a set of data sources
  • Semantic approach (Ontology-based approach)
  • difficult to integrate structural aspects of
    sources from semantic perspective due to
    inherent embedded semantics within local schemas
    implicit assumptions
  • integration by sharing a common ontology among
    the differentdata sources

7
Global-as-view vs. Local-as-viewwrt Mapping
Direction
  • Global-as-view approach
  • each item in Global schema/ontology as a view
    (query) over source schemas/ontologies
  • query(G) query(f(S1, S2, , Sn))
  • straightforward query rewriting
  • Local-as-view approach
  • Each source as a view/query over global
    schema/ontology
  • query(G) query(f1-1 (S1), f2-1(S2), , fn-1
    (Sn))
  • easy adding or removing sources

8
Representative Systems
  • TSIMMIS (Stanford IBM, 1995)
  • MedMaker (Stanford, 1996)
  • MIX (SDSCUCSD, 2000)
  • IM (ATT, 1996)
  • ClioGarlic (IBM, 2000)
  • DIXSE (UT, 2001)
  • XYLEME (2001)
  • HERMES (UMD, 1994)
  • SIMS (USC, 1996)
  • Observer (UG, 1996)
  • Infosleuth (MCC, 1997)
  • COIN (MIT, 1999)
  • Ontobroker (Ger., 2000)
  • KIND (SDSCUCSD, 2001)

9
Our Approach
  • Virtual Integration retrieve data and resolve
    conflicts at query time, easy maintenance
  • Structural Approach take users knowledge on
    data semantics hidden in structural information
    as input to achieve semantic mediation
  • Local-as-view easily adds or removes sources,
    convenient to fit applications
  • GUI for specifying semantic mappings through
    assigning same index to same meaning nodes
    (paths)
  • Automatically generate DDXMI for query
    decomposition
  • Semantic functions

10
Current Prototype Architecture
11
Distributed Database XML Metadata Interface
(DDXMI)
  • Include Database or XML document name or location
    information
  • Contain table columns or XML path information
  • Function or operation name for resolving semantic
    issues about table columns or XML elements and
    attributes

12
DDXMI DTD
lt!ELEMENT DDXMIA (DDXMI.header,
DDXMI.isequivalent, documentspec)gt lt!ELEMENT
DDXMI.header (documentation,version,date,authoriza
tion)gt lt!ELEMENT documentation (PCDATA)gt lt!ELEMEN
T version (PCDATA)gt lt!ELEMENT date
(PCDATA)gt lt!ELEMENT authorization
(PCDATA)gt lt!ELEMENT DDXMI.isequivalent
(source,destination)gt lt!ELEMENT source
(PCDATA)gt lt!ELEMENT destination
(PCDATA)gt lt!ELEMENT documentspec (document,
(elementname,operation))gt lt!ELEMENT document
(PCDATA)gt lt!ELEMENT elementname
(PCDATA)gt lt!ELEMENT operation (PCDATA)gt
13
How to generate DDXMI
  • Define a Master DTD (global schema) based on
    application requirements for choosing elements or
    tables from the distributed systems
  • Parse the master DTD and generate a path for each
    element from root to current element
  • Assign the master index number to the site
    element node which has the same meaning of the
    master DTD node
  • May include a function name for some nodes
  • Generate DDXMI file automatically by collecting
    over same index numbers

14
Generate Master Index
15
Site1 Book1 DTD Tree
Index number
function name
16
Book1 Path Information
Site1 Index
Master Index
  • 0 book1.xml
  • 1 /bib/book
  • 11 /bib/book/price
  • /bib/book/author
  • 1211 /bib/book/author/first
  • 1212 /bib/book/author/last
  • /bib/book/title
  • 15 /bib/book/publisher
  • 16 /bib/book/editor
  • 161 /bib/book/editor/affiliation
  • 162 /bib/book/editor/last
  • 162 /bib/book/editor/first

17
Site 2 Book2 DTD Tree
18
Book2 Path Information
Site2 Index
Master Index
0 book2.xml 1 /arts/book 12 /arts/book/author
1211 /arts/book/author/firstname 1212
/arts/book/author/lastname 13 /arts/book/title
15 /arts/book/publisher
19
Site 3 Book3 DTD Tree
20
Book3 Path Information
Site3 Index
Master Index
  • 0 book3.xml
  • /bookstore/book
  • 11 /bookstore/book/price
  • 12 /bookstore/book/author
  • 1211 /bookstore/book/author/name
  • 1212 /bookstore/book/author/name
  • 13 /bookstore/book/title

21
XML Query Languages
  • XQL takes a document point of view
  • XML-QL takes a database point of view
  • Quilt draws from both areas
  • proposed by Don Chamberlin, Jonathan Robie, and
    Daniela Florescu
  • Kweelt (University of Washington), a XML query
    engine based on Quilt, used in our prototype
  • XQuery proposal follows Quilt closely

22
How to generate site queries
  • Parse the master query, a query over the global
    schema
  • If encounter a path, depending on its kind, get
    corresponding path name from DDXMI file and
    substitute it
  • If there is no corresponding path in the DDXMI,
    then put it as a null value
  • ? no queries generated for that site

23
How to get site element names
In Quilt Query 1.book ? bookstore/book 2.
price ? bookstore/book/price_info/price ?
price_info/price
cut!!
ltsourcegtbooklt/sourcegt ltdestinationgtbooksore/boo
klt/destinationgt ltsourcegtbook/pricelt/sourcegt
ltdestinationgtbookstore/book/price_info/priceltdesti
nationgt
24
11 Mapping Example
FOR book IN document("book.xml")//book publ
isher "Addison-Wesley" RETURN
ltbookgtbook/titlelt/bookgt
Book1
Master index
bib
Book2
book
arts
book
book
price
editor
publisher
title
year
author
title
publisher
publisher
title
Book3
full_name
affiliation
full_name
bookstore
book
last_name
first_name
title
25
Query Execution Result
26
1N Mapping Example
FOR edi IN document("book.xml")//book/editor
RETURN lteditorgtedi/full_namelt/editorgt
27
Query Execution Result
28
N1 Mapping Example
FOR a IN document("book.xml")//book//author RETUR
N ltauthorgt a/last_name,a/first_name lt/authorgt
ltoperationgtfstringlt/operationgt
ltoperationgtlstringlt/operationgt
29
Query Generation Result
import split as UDF_split FUNCTION
fstring(str) split(" ",str)1 FUNCTION
lstring(str) split(" ",str)2 FOR a IN
document("book3.xml") //book//author RETURN
ltauthorgt fstring(a/name), lstring(a/name) lt/a
uthorgt
30
Query Execution Result
31
Semantic Function Involved Example
FOR book IN document("book.xml")//book RETURN
ltbookgt book/title,book/author,boo
k/price lt/bookgt
ltoperationgtdiv(100)lt/operationgt
32
Query Execution Result
33
Remaining Issues
  • Handle attributes one DTD has an attribute but
    others dont, or an attribute in one DTD as an
    element in others
  • More efficient way for generating DDXMI file
    automatically when there are many paths in the
    master DTD
  • e.g., treetree mapping if two paths are
    indicated as the same and have the same children,
    then the index numbers should be generated
    automatically
  • Migrate to XML schemas, instead of DTDs
  • Support JOIN, PRODUCT generated by queries
  • Move to XQuery and a query engine with
    distributed query support
  • Integrate the individual site query results as
    one return as a single data source ready for
    further analysis
  • Provide mechanisms for removing redundancy
  • Justify the semantics of the query generated

34
Conclusion
  • Our prototype uses distributed metadata to
    generate a GUI tool to describe mappings between
    master and local databases by assigning index
    numbers and specifying conversion function names
  • Uses Quilt as its XML query language.
  • A DDXMI file is generated based on the mappings,
    and is used to translate queries over the virtual
    master database into sub-queries to local
    databases
  • An experiment testing feasibility is reported in
    which 3 different bibliography databases are
    integrated.
  • Implemented with Java Webserver and JavaCC
  • Move to real applications, e.g. in the context of
    NSF project SEEK (Science Environment for
    Ecological Knowledge)
Write a Comment
User Comments (0)
About PowerShow.com