Title: XML to Relational Mapping
1XML to Relational Mapping
2XML-to-Relational Mapping
XML Translation Layer
Tuples
Relational Database System
3Where/How to Store XML Data ?
- File system
- OODBMS
- Semistructured DBMS Lore, etc.
- XML DBMS eXcelon, Tamino, etc.
- RDBMS/ORDBMS
- CLOB(Character Large Object) column
- Decomposing (Shredding) into tuples
- XML Type
4Approaches to XML Data Storage Using an RDBMS
- Predefined set of table schema
- Automatic generation of table schema for given
DTD - Data mining approach
5Predefined Schema
- simple ad-hoc schemes
- requires no input by the user or by the system
administrator - works in the absence of type (e.g., DTD)
- does not involve any analysis of the XML data
- many variations possible
- Edge-inlining
- XRel
- XParent
6Predefined SchemaFlor99
7XML Data Model
- Ordered labeled graph
- Each XML element is represented by a node in the
graph the node is labeled with the oid of the
XML object. - Element-subelement relationships are represented
by edges in the graph and labeled by the name of
the subelement. - In order to represent the order of subelements of
an XML object, the outgoing edges of a node in
the graph are also ordered. - Values (e.g., strings) of an XML document are
represented as leaves in the graph.
8Example XML Data
9XML Data Model
10Mapping XML Data into Relational Table
- Mapping Edges
- Edge Approach
- Binary Approach
- Universal Table Approach
- Mapping Values
- Separate Value Tables
- Inlining
- 3 x 2 6 possible mapping schemes
11Edge Approach
- Store all edges of the XML document in a single
table Edge Table
Edge (source, ordinal, name, flag, target)
12Edge Approach (Contd)
- The index on the source column is useful for
forward traversals needed to reconstruct a
specific object given its oid. - The index on name, target is useful for
backward traversals - e.g.) find all objects that have a child
named John.
13Binary Approach
- Group all edges with the same label into one
table - This approach corresponds to a horizontal
partitioning of the Edge Table, using name as the
partitioning attribute.
14Universal Table Approach
- Generate a single Universal Table to store all
the edges. - The Universal Table corresponds to the result of
a full outer join of all Binary Tables. - The Universal Table has many fields which are set
to NULL. - This Universal Table is denormalized.
15Separate Value Tables
- Store values in the separate Value Tables.
- A separate Value Table for each conceivable data
type - e.g., integer, date, ref, string, etc.
16Inlining
- Store values and attributes in the same table
- e.g., binary inlining approach
17Modified Edge-Inlining Approach
- XML Data Model ordered, labelled tree
- Node id (i.e., source) assigned in preorder
traversal of XML tree - 1 Table
- Node id (source)
- Tag name (name)
- Text (PCDATA)
- Parent Node id (tagging)
- Path (optional)
- Type (element or attribute)
- Level
- Doc id
18Automatic Schema Generation out of DTDShan 99
- Assumption
- XML document conforms to a scheme (DTD)
- Transformation Approach
- Three Techniques
- Basic, Shared, Hybrid Inlining
- DTD ? DTD Graph ? Element Graph ? Table Schema
19DTD to Relational Schema
- Naïve Approach
- Each Element gt Relation
- Each Attribute of Element gt Column of Relation
- Connect elements using foreign keys
- Fragmentation problem
- Too many relations
- Requires many joins in query processing
20Naïve Approach Example
lt!ELEMENT author (name, address)gt lt!ATTLIST
author id ID REQUIREDgt lt!ELEMENT name
(firstname?, lastname)gt lt!ELEMENT firstname
(PCDATA)gt lt!ELEMENT lastname (PCDATA)gt lt!ELEME
NT address ANYgt
author (authorID integer, id string) name
(nameID integer, authorID integer) firstname
(firstnameID integer, nameID integer, value
string) lastname (lastnameID integer, nameID
integer, value string) address (addressID
integer, authorID integer, value string)
21Inlining
- Inlining
- solves fragmentation problem
- Basic inlining as many descendents of an element
as possible into a single relation - Shared inlining the nodes with an in-degree of 1
(DTD graph) - Hybrid
- Basic Shared
- inlining the nodes with an in-degree gt 1 (DTD
graph) - Issues
- set-valued attribute
- recursion
22DTD
?
23DTD Graph
24Element Graph for editor
25Basic Inlining Technique
26Shared Inlining Technique
27Hybrid Inlining Technique
- Same as Shared Inlining except
- inlines some elements that are not inlined in
Shared - inlines elements with in-degree greater than one
- that are not recursive or
- reach through a node
28Data Mining Approach
- Data Pattern Analysis
- Wang Lis Semistructured Data Mining Algorithm
- STORED(Semistructured To Relational Data)
- a declarative language for description of
XML-to-Relational Mapping - Overflow Mapping
- definition for the objects not mapped to
relational data - overflow graph
- stored in the object repository of semistructured
data
29Data Pattern Analysis
STORED Mapping
XML Data
Algo
Overflow Mapping