DTDDirected Transformations for Data Exchange in XML - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

DTDDirected Transformations for Data Exchange in XML

Description:

XML view of DTD D: a 'parse' tree of D, possibly nested to an arbitrary level ... given a string s, parse s with the CFG. defined with a CFG ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 23
Provided by: CIS471
Category:

less

Transcript and Presenter's Notes

Title: DTDDirected Transformations for Data Exchange in XML


1
DTD-Directed Transformations for Data Exchange
in XML
  • Wenfei Fan
  • Internet Management Research Dept., Bell Labs
  • Dept. of CIS, Temple University

2
Overview
  • Ongoing application-oriented research
  • DTD-directed transformations
  • Attribute Transformation Grammars (ATGs)
  • Open questions
  • joint work with M. Benedikt, C. Chan, R. Rastogi

3
  • 1. DTD-directed transformations

4
Data exchange on the Web
  • A community (industry) agrees on a certain DTD.
    Subsequently all members of the community
    exchange data w.r.t. the DTD
  • e-commerce
  • health-care
  • ...

Web
DTD
XML
XML
Q view
RDB
OODB
5
Example
  • An XML view for patients SSN, name and treatment
    procedure
  • a relational database at a hospital
  • Patient (SSN, name, tr-name)
  • Treatment (name, cost)
  • Procedure (name1, name2) -- composition
    hierarchy
  • . . .
  • a DTD predefined by an insurance company
  • r ? patient
  • patient ? SSN, name, treatment
  • treatment ? name, procedure
  • procedure ? treatment -- recursive

6
DTD and the structure of the XML data
  • XML rooted, node-labeled tree
  • r ? patient
  • patient ? SSN, name, treatment
  • treatment ? name, procedure
  • procedure ? treatment
  • lt-- unbounded

...
7
DTD-directed transformation
  • Given a relational database, schema R
  • a predefined DTD D (with certain semantic
    information)
  • Question Is there any systematic way to define
    an XML view of R (i.e., ? R ? D) that guarantees
    to conform to D?
  • R Patient (SSN, name, tr-name)
  • Treatment (name, cost)
  • Procedure (name1, name2), ...
  • D r ? patient
  • patient ? SSN, name, treatment
  • treatment ? name, procedure
  • procedure ? treatment

8
More on DTDs
  • DTD D (E, P, r) -- an Extended Context-Free
    Grammar (ECFG)
  • E a set of element types (nonterminals)
  • P production rules (in a normal form), e ? ?
  • ? s ? e1, , en e1
    en e
  • where s string, ? empty word,
    , concatenation, disjunction,
    Kleene closure
  • r start symbol
  • e.g., r ? patient
  • patient ? SSN, name, treatment
  • treatment ? name, procedure
  • procedure ? treatment

9
Difficulties
  • Given an input type and a fixed (complex) output
    type, is there any systematic way to write a
    program that type checks?
  • XML view of DTD D a parse tree of D, possibly
    nested to an arbitrary level that cannot be
    decided statically

10
Existing systems
  • ignore the requirement of DTD-conformance
  • SilkRoute (ATT), XPERANTO (IBM), Microsoft,
    Oracle, incapable of coping with a predefined
    DTD
  • simply define an XML view and then check its
    conformance
  • undecidable in general, co-NEXPTIME for extremely
    restricted view definitions
  • no guidance on how to define XML views that
    typecheck
  • one gets an XML view that typechecks only after
    repeated failures and with luck

11
  • 2. Attribute Transformation Grammars (ATGs)

12
Attribute Transformation Grammars (ATGs)
  • ATG ? ( D, Attributes, Rules), ? R ? D
  • D DTD
  • Attributes an (inherited) attribute tuple e is
    associated with each element type e
  • e to pass value as well as control
  • Rules for each production e ? ? and each element
    type e in ?, a semantic rule is associated e
    Q(e) or e ? Q(e)
  • Q SQL query on instances of R,
  • e the attribute of the parent of e (in the
    production), treated as a constant parameter in Q
  • ? iteration over tuples single tuple
    assignment

13
Example ATG
  • r ? patient
  • patient ? select SSN, name, tr-name
  • from Patient
  • patient ? SSN, name, treatment
  • SSN patient.SSN, name
    patient.name
  • treatment patient.tr-name
  • recall Patient (SSN, name, tr-name)

14
Example (contd)
  • treatment ? name, procedure
  • name treatment -- value
  • procedure treatment -- parameter
  • procedure ? treatment
  • treatment ? select name2
  • from Procedure
  • where procedure name1
  • recall Procedure (name1, name2)

15
Evaluation of an ATG
  • Top-down, lazy-evaluation (data-driven)
    treatment is further expanded as long as
    procedure is not empty.
  • DTD-directed the XML tree is constructed
    strictly following the productions (the
    evaluation aborts if inconsistency is detected)

16
ATGs vs. Attribute Grammars (AGs)
  • ATGs are inspired by AGs, but are not a mild
    variation of AGs.
  • AGs (CFG G, Attributes, Rules)
  • given a string s, parse s with the CFG
  • defined with a CFG
  • support both synthesized and inherited attributes
  • ATGs (ECFG, Attributes, Rules)
  • given a database, extract relevant information
    from the database with SQL to build a parse tree
    of the ECFG
  • defined with an ECFG
  • support inherited attributes and SQL queries
  • it does not make sense to parse a database
    w.r.t. a DTD

17
  • 3. Open questions

18
Open questions (1)
  • The exact expressive power of ATGs
  • powerful enough to deal with all/most found in
    practice?
  • Strictly more expressive than SilkRoute,
    XPERANTO, ...
  • extra power by incorporating synthesized
    attributes?
  • termination analysis Does an ATG ? R ? D
    terminate without aborting on all instances/a
    fixed instance of R?
  • for a fixed database I decidable in PTIME in
    I
  • for an arbitrary database and any ATG defined
    with conjunctive queries and arbitrary DTDs
    decidable in EXPTIME in ?
  • for an arbitrary database and ATGs defined with
    general SQL queries undecidable

19
Open questions (2)
  • Information-preserving transformations
  • Question Given a relational schema R and a DTD
    D, is there any ATG ? R ? D such that for any
    instance I of R, ?(I) preserves the information
    of I?
  • challenging after 20 years, the notion of
    information preservation (semantic
    equivalence) is still not clear!
  • important central to data transformations, data
    integration, storage optimization, and database
    design
  • Existing definitions
  • either too weak to capture what people want in
    practice
  • or completely impractical

20
Open questions (3)
  • Incorporating integrity constraints
  • XML data is typically specified with (D, ?),
    where
  • D a DTD
  • ? constraints, e.g., keys and foreign keys
  • Question Given a relational schema R and an XML
    specification (D, ?), is there any ATG ? R ? D
    such that for any instance I of R, ?(I) both
    conforms to D and satisfies ??
  • challenging even the consistency of (D, ?) is
    undecidable
  • important practical restrictions on D or/and ?

21
Open questions (4)
  • Combining XML queries and view definitions
  • It is common to query XML views -- XQuery, XSL,
    ...
  • Question Given an ATG ? R ? D and a query Q on
    XML data that conforms to D, is there another ATG
    ? R ? D such that for any instance I of R,
    ?(I) Q(?(I))?
  • One needs new query composition/propagation
    techniques in the context of XML

22
Summary
  • ATGs yield a systematic way to conduct
    DTD-directed transformations they provide
    guidance to define DTD-conformant XML views
  • ATGs can be naturally generalized to deal with
  • DTD-directed transformations from OODBs, etc
  • DTD-directed transformations between XML
    documents
  • DTD-directed integration of heterogeneous data
    sources
  • Many important questions in connection with ATGs
    remain open ? application-oriented theoretical
    problems
Write a Comment
User Comments (0)
About PowerShow.com