Title: DDL2XSD Transformation
1DDL2XSD Transformation
- Automatic Generation of an optimal XML Schema
for a given Relational Schema
Melanie Weis, Berufsakademie Stuttgart, Germany,
2003
2Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
3Introduction
- Relational Databases are the most widely used
type of database - Their structure is defined in a relational schema
- XML is becoming the standard for data exchange on
the World Wide Web - XML data can be structured using an XML Schema
- XML schema representations for relational
database schemas need to be generated
4Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
5DDL2XSD Overview
- DDL2XSD
- Task perform the tedious task of determining an
XML representation of a relational schema
automatically - Input relational schema specified in SQL DDL
file - Output optimal XML Schema representation of
relational schema - An optimal XML Schema is characterized by
- Structural integrityall relationships between
tables are translated - Data minimizationthe amount of redundant data
and null values in an instance XML document is
minimized - Query efficiencyXML schema has to minimize the
cost of executing queries
6DDL2XSD Overview
7DDL2XSD Overview
3 components
relational graph G(V,E)
Graph Generator
DDLfile
8DDL2XSD Overview
3 components
relational graph G(V,E)
XMLStructureExtractor
XMLStructure G(V,N ? R)
Graph Generator
DDLfile
9DDL2XSD Overview
3 components
relational graph G(V,E)
XMLStructureExtractor
XMLStructure G(V,N U R)
XMLSchemaGenerator
XMLSchema file
Graph Generator
DDLfile
10Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
11Graph Generator
- Task generate internal graph representation of
relational schema - Input relational schema in form of an SQL DDL
file - Output graph G(V, E)
- V is the set of vertices
- E is the set of edges
- A vertex represents a table
- An edge represents a relationship between tables
- G(V,E) stores all information translated into XML
Schema
12Graph Generator
CREATE TABLE Company ( CID INTEGER NOT NULL
PRIMARY KEY, cname VARCHAR(50) ) CREATE TABLE
Project ( PID INTEGER NOT NULL PRIMARY
KEY, name VARCHAR(50), company INTEGER NOT
NULL REFERENCES Company(CID) ) CREATE TABLE
Department ( deptID INTEGER NOT NULL PRIMARY
KEY, deptName VARCHAR(50), manager INTEGER
UNIQUE NOT NULL, company INTEGER NOT NULL
REFERENCES Company ) CREATE TABLE Employee
( serial INTEGER CHECK(serial gt 0) NOT NULL
PRIMARY KEY, name VARCHAR(50), company
INTEGER NOT NULL REFERENCES Company(CID), dept
INTEGER NOT NULL REFERENCES Department, proj
INTEGER REFERENCES Project ) ALTER TABLE
Department ADD CONSTRAINT c1 FOREIGN
KEY(manager) REFERENCES Employee
13Graph Generator
CREATE TABLE Company ( CID INTEGER NOT NULL
PRIMARY KEY, cname VARCHAR(50) ) CREATE TABLE
Project ( PID INTEGER NOT NULL PRIMARY
KEY, name VARCHAR(50), company INTEGER NOT
NULL REFERENCES Company(CID) ) CREATE TABLE
Department ( deptID INTEGER NOT NULL PRIMARY
KEY, deptName VARCHAR(50), manager INTEGER
UNIQUE NOT NULL, company INTEGER NOT NULL
REFERENCES Company ) CREATE TABLE Employee
( serial INTEGER CHECK(serial gt 0) NOT NULL
PRIMARY KEY, name VARCHAR(50), company
INTEGER NOT NULL REFERENCES Company(CID), dept
INTEGER NOT NULL REFERENCES Department, proj
INTEGER REFERENCES Project ) ALTER TABLE
Department ADD CONSTRAINT c1 FOREIGN
KEY(manager) REFERENCES Employee
Company
Employee
Department
Project
14Graph Generator
CREATE TABLE Company ( CID INTEGER NOT NULL
PRIMARY KEY, cname VARCHAR(50) ) CREATE TABLE
Project ( PID INTEGER NOT NULL PRIMARY
KEY, name VARCHAR(50), company INTEGER NOT
NULL REFERENCES Company(CID) ) CREATE TABLE
Department ( deptID INTEGER NOT NULL PRIMARY
KEY, deptName VARCHAR(50), manager INTEGER
UNIQUE NOT NULL, company INTEGER NOT NULL
REFERENCES Company ) CREATE TABLE Employee
( serial INTEGER CHECK(serial gt 0) NOT NULL
PRIMARY KEY, name VARCHAR(50), company
INTEGER NOT NULL REFERENCES Company(CID), dept
INTEGER NOT NULL REFERENCES Department, proj
INTEGER REFERENCES Project ) ALTER TABLE
Department ADD CONSTRAINT c1 FOREIGN
KEY(manager) REFERENCES Employee
Company
Employee
Department
Project
15Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
16XML Structure Extractor
- TaskDetermine hierarchical XML structure for
flat relational schema - Input G(V,E)
- Output structure G(V, N ? R)
- N set of edges translated by nesting, called
nestEdges - R set of edges translated by XML references,
called referenceEdgesN and R disjoined sets, and
N ? R E
17XML Structure Extractor
- 3 informed search algorithms
- Greedy search algorithm GSA
- Seach for all solutions ASSA
- Search for all best solutions BSSA
- Cost/goodness of an edge/solution depends on
- Type of relationship represented by an edge (1N,
11, 1N) - Frequency of possibly stored null values
- Depth of vertices in tree structure G(V,N)
- Join operations performed according to defined
workload
18XML Structure Extractor
GSA
Company
Employee
Department
Project
19XML Structure Extractor
GSA
Company
Employee
Department
Project
20XML Structure Extractor
GSA
Company
Employee
Department
Project
21XML Structure Extractor
GSA
Company
Employee
Department
Project
22XML Structure Extractor
ASSA
23XML Structure Extractor
BSSA
Initial state
Iteration 1
Iteration 2
Iteration 3
24XML Structure Extractor
- GSA
- Low time and space complexity
- only one, possibly suboptimal solution
- ASSA
- High time and space complexity
- Flexibility to choose desired XML structure
- BSSA
- Acceptable time and space complexity
- Limited flexibility to choose desired XML
structure
25Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
26XML Schema Generator
- Task Generates XML Schema file for structure
represented by G(V, N ? R) - Input G(V, N ? R)
- Output XML Schema representation of relational
schema - Vertices are translated into XML element
definitions - Edges in N represent nesting of XML elements
- Edges in R are translated by XML referential
constraints (key / keyref pairs)
27XML Schema Generator
XML Schema
28Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
29Practical Results
- Time measurements over 100 runs for four schemas,
depending on algorithm - As expected, execution time increases from GSA to
BSSA to ASSA - No measurement for ASSA on company.ddl not
enough memory - No correlations between measurements could be
found
30Practical Results
- Number of states in memory for 4 sample schemas
for ASSA and BSSA - GSA discarded because only one state in memory,
regardless of schema
31Agenda
- Introduction
- DDL2XSD Overview
- Graph Generator
- XML Structure Extractor
- XML Schema Generator
- Practical Results
- Conclusion
32Conclusion
- DDL2XSD generates optimal XML Schema in respect
to - (1) structural integrity
- (2) Minimization of stored data
- (3) Query Execution
- Three major components
- Graph Generator generation of graph
representation for relational schema - XML Schema Extractor determination of XML
structure - XML Schema Generator generation of the XML
Schema file according to determined XML structure
33Conclusion
34Conclusion
- Future Work
- Extension to support schema languages other than
DB2 SQL DDL and XML Schema - Introduction of XML Schema normalization to
further reduce data redundancy - Specification of relational schema in form of UML
or Entity-Relationship models
35The End