Title: XMLtoRelational Schema Mapping Algorithm ODTDMap
1XML-to-Relational Schema Mapping Algorithm ODTDMap
- Speaker Artem Chebotko
- Email artem_at_wayne.edu
- Wayne State University
- Joint work with Mustafa Atay, Shiyong Lu and
Farshad Fotouhi
2Introduction
- XML has emerged as the standard for representing
and exchanging data on the World Wide Web. - The increasing amount of XML documents requires
the need to store and query XML documents
efficiently.
3Current approaches of storing and querying XML
documents
- Native XML repositories, e.g., Software AGs
Tamino, eXcelons XIS. - XML-enabled commercial database systems such as
SQL Server, Oracle, and DB2 - Using RDBMS/ODBMS to store and query XML
documents.
4Issues of the relational approach
- Schema Mapping
- XML data model needs to be mapped into the
relational model - Data Mapping
- XML documents need to be shredded and composed
into tuples to be inserted into the relational
database - Query Mapping
- XML queries need to be translated into SQL
queries - Reverse Data Mapping
- Query results need to be tagged to XML format.
5Our contributions
- We propose a schema mapping algorithm, ODTDMap,
which generates a relational schema from an XML
DTD for storing and querying ordered XML
documents. - Improvements over the existing algorithms
- Losslessness
- Efficient support for XML queries
- Completeness (recursion, set-valued attributes
DTD operators)
6Outline of the talk
- Introduction of XML DTDs
- Mapping DTDs to relational schemas
- Simplifying DTDs
- Creating and inlining DTD graphs
- Generating relational schemas
- An example
- Conclusions and future work
7An overview of DTDs A DTD example
- lt!DOCTYPE memo
- lt!ELEMENT memo (to, from, date, subject?, body)gt
- lt!ATTLIST memo security CDATAgt
- lt!ATTLIST memo lang CDATAgt
- lt!ELEMENT to (PCDATA)gt
- lt!ELEMENT from (PCDATA)gt
- lt!ELEMENT date (PCDATA)gt
- lt!ELEMENT subject (PCDATA)gt
- lt!ELEMENT body (para)gt
- lt!ELEMENT para (PCDATA)gt
8DTD Document Type Defintion
- lt!DOCTYPE root-element doctype-declaration...
- lt!ELEMENT element-name content-modelgt, content
model , ,, , , ? - lt!ATTLIST element-name attr-name attr-type
attr-default ...gt
9DTD Document Type Definition (cont)
- lt!ATTLIST element-name attr-name attr-type
attr-default ...gtdeclares which attributes are
allowed or required in which elements attribute
types - CDATA any value is allowed (the default)
- (value...) enumeration of allowed values
- ID, IDREF, IDREFS ID attribute values must be
unique (contain "element identity"), IDREF
attribute values must match some ID (reference to
an element) - ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION
just forget these... (consider them deprecated) - attribute defaults
- REQUIRED the attribute must be explicitly
provided - IMPLIED attribute is optional, no default
provided - "value" if not explicitly provided, this value
inserted by default - FIXED "value" as above, but only this value is
allowed
10Mapping DTDs to relational schemas
- Simplifying DTDs
- Creating and inlining DTD graphs
- Generating relational schemas
11Simplifying DTDs
- A DTD might be very complex due to nesting, e.g.,
- ltELEMENT a ((b, c, d?)?, (e?, f, (g,
h?))?)gt - An XML query language is concerned about
- The parent-child relationships between XML
elements - The relative order relationships between siblings
(add an ordinal attribute to each relation)
12DTD simplifications rules
- e ? e
- e? ? e
- (e1 en) ? (e1, ,en)
- (a) (e1, ,en) ? (e1, ,en)
- (b) e ? e
- 5. (a) , e, , e, ?,e, ,
- (b) , e, , e, ?,e, ,
- (c) , e, , e, ?,e, ,
- (d) , e, , e, ?,e, ,
13Example of simplifying a DTD
- ltELEMENT a ((b, c, d?)?, (e?, f, (g, h?))?)gt
- simplified to
- ltELEMENT a (b, c, d, e, f, g, h)gt
14Creating and inlining DTD graphs
- We create a DTD graph based on the simplified
DTD. - Definition 3.2 (DTD graph) The structure of a DTD
can be represented by a labeled graph, in which
nodes represent elements and attributes, and
edges represent their parent-child relationships.
The edges are labeled by either ' (star edge)
or , ' (normal edge) where the label ,' is not
shown for simplicity. - Idea inline a child c to its parent p if p can
contain at most one occurrence of c. - Rationale inlined elements will produce a
relation.
15Inlinable node and subtree, shared node
- Definition 3.3 (Inlinable node) Given a DTD
graph, a node is inlinable if and only if it has
exactly one incoming edge and that edge is a
normal edge. - Definition 3.4 (Inlinable subtree) Given a DTD
graph and a node e in the graph, e and all other
inlinable nodes that are reachable from e by
normal edges constitute a subtree. This subtree
is called the inlinable subtree for the node e
(it is rooted at e). - Definition 3.5 (Shared node) Given a DTD graph, a
node is called a shared node if it has more than
one incoming edge.
16Inlining
- Case 1 Node a is connected to b by a normal edge
and b has no other incoming edges, inlining b to
a. - Case 2 Node a is connected to b by a normal edge
but b has other incoming edges, b is a shared
node, no inlining. - Case 3 Node a is connected to b by a star edge,
no inlining.
17Inlining (cont)
18Inlining DTD graphs
19Complexity of inlining
- Theorem 3.7 (Time Complexity)
- The time complexity of our inlining algorithm is
O(n) where n is the number of elements in the
input DTD.
20The inlining procedure
21The inlining procedure (cont)INCORRECT
22The inlining procedure (cont)CORRECT
23Generating relational schema
24Generating schema mapping info.
- Definition 3.8 (s Mapping) s is a mapping from X
to R, where X is the set of XML element and
attribute types in the input XML DTD, and R is
the set of relations in the relational database.
Given an XML element type e, s(e) will return the
corresponding relation that is used to store e.
Similarly, given an XML attribute type a of
element type e, s(e.a) will return the
corresponding relation that is used to store a of
e.
25A complete example
26DTD graphInlined DTD graph
27Generated relational schema
28Conclusions
- We defined the schema mapping algorithm ODTDMap,
which has several improvements over the existing
ones. - It is lossless in the sense that one can
reconstruct original XML document in the given
document order, based on the target relational
schema generated by ODTDMap. - It has efficient support for recursive queries
and schemas. - It defines how to map set-valued XML attributes.
- Experimental results showed good performance and
scalability of the algorithm.
29Future work
- Extending our work to XML Schema to support data
types other than string type. - Maintain the ID/IDREF/IDREFS in terms of key and
foreign key constraints.