Tamino a DBMS Designed for XML - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Tamino a DBMS Designed for XML

Description:

Storing, managing, publishing and exchanging XML documents. Business modeling ... Within a collection, declare several document types ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 36

Provided by: protonScs

Category:

more less

Transcript and Presenter's Notes

Title: Tamino a DBMS Designed for XML

1
Tamino a DBMS Designed for XML

Dr. Harald Schoning
Presenter Wenhui Li
University of Ottawa
Instructed by
Dr. Mengchi Liu
Carleton University

2
Abstract

Who?- Software AG
What?- XML database management system
When?
1999 the first time unveiled
2004 June Tamino XML Server 4.2
Why?
management and transfer of structured and
unstructured data
completely designed for XML

3
Industry Background

XML is becoming prevailing for data processing in
the internet.
Early goals of Tamino
Easy data exchanging
Evolution trend
Storing, managing, publishing and exchanging XML
documents
Business modeling

4
Industry Background contXML support in
databases

Oracle XML Developers Kit
SQL Server 2000
DB2 XML Extender

5
Limitations of XML support via traditional RDBMS
or ORDB

XML is not well-structured like RDB,ORDB or OODB
Storing and querying XML is possible but not
feasible in these DB systems

6
Two Modeling approaches

Data-centric documents
Regular structure
Order does not matter
No mixed content
Document-centric documents
less regular structure
significance of the order
mixed content

7
Why dont use relational DB

XML documents can have schematic information
(DTD), but they are not required to.
classical database handling objects of a
predefined type, cannot be applied in XML

8
Why doesnt use XML itself?

XML is just a markup language, it does not
contain processing facilities on its own
querying a set of XML documents is outside the
scope of the XML recommendation
Therefore, comes the Tamino!

9
What does Tamino do?

Whats Tamino (the 1st slide)
Store XML documents, HTML files and GIF images,
etc.
Retrieve them in a set-oriented manner, with
sophisticated query facilities

10
Taminos architecture
11
The schema of XML documents

XML support schematic information, but it differs
from the classical databases
DTD have a couple of deficiencies (e.g. data
type)
W3C working group is developing an XML schema
description language
However, DTD is the only standard schema at
present

12
XML schema vs. RDB and OODB schema

In RDB or OODB, the schema is created before the
instances can be stored
Instances must conform to the declared schema
In XML database, each instance declares a schema
on its own.
for XML documents, grouping of objects of
homogeneous structure into (pre-defined) tables
or classes doesnt work

13
Query and Index of XML schema

Queries operate on sets
Indexes are defined on the basis of a common
schema
For the purpose of querying, arbitrary objects
could be grouped to sets
Index definition also requires at least a common
subset in the structure

14
Schema handling in Tamino

Grouping documents by open content model
user-directed document grouping
Documents grouped into collections
Within a collection, declare several document
types
For each document type define a common schema
(open content model)
For each document, Tamino assigns one of the
document type

15
Type Assignment

Assignment is based on the root element type
Document must match the schema of the document
type assigned, but might have additional
elements/attributes
In a document type, documents might differ
considerably
If no appropriate document type, document is
stored without any schema checking

16
Tamino schema example
17
Document accepted by Tamino

ltCity Inhabitants138000gt
ltNamegtDarmstartlt/Namegt
ltAdditiongtThe city of art nouveaudlt/Addtiongt
ltMonument Height39mgt
ltNamegtLanger Ludwiglt/Namegt
ltLocationgt
ltNamegtLuisenplatzltNamegt
ltMapIndexgtM5lt/MapIndexgt
lt/Locationgt
lt/Monumentgt
lt/Citygt

18
Is an element/attribute should be modeled?

an index will be defined on this
element/attribute
the element/attribute is to be mapped to an
external data source or to a server extension
dedicated access rights will be defined on the
element/attribute
the presence / multiplicity of the element is to
be enforced
one of the above conditions hold for a child of
the element

19
Indexing of Tamino

value-based indexes
well known from traditional database systems
used to accelerate the search
exactly address the data object
names need not be unique within a DTD

20
Example of value-based index

value-based indexes
data-centric view
lt!ELEMENT City (Name, Inhabitants, Monument)gt
lt!ELEMENT Monument (Name, Description)gt
lt!ELEMENT Inhabitants (PCDATA)gt
lt!ELEMENT Name (PCDATA)gt
lt!ELEMENT Description (PCDATA)gt

21
Indexing of Tamino (cont)

text indexing
document-centric view
limit the scope to a specific part of the
document
the scope might span element content

22
Example of text index

text indexing
document-centric view
ltstatementgt
ltauthorgt
ltfirstnamegtHaraldgtlt/firstnamegt
ltlastnamegtSchoninglt/lastnamegt
lt/authorgt
lttextgt
XltitalicgtMlt/italicgtL and XltitalicgtSlt/italicgtL
are ltstressedgtverylt/stressedgt important
lt/textgt
lt/speechgt

23
Indexing of Tamino (cont)

structural index
If multiplicity permits the omission of elements
or if no DTD is known
Example
in a database of all European cities
search all those cities which have an element
called beach

24
Querying XML documents

Currently, there is no standardized query
language
XPath allows positioning within a single document
XPath fits well the needs of retrieval in
data-centric environments
document-centric environments need a more
content-based retrieval facility
Tamino also supports full text search

25
Expectation for XML processor

W3CXML recommendation specifies the handling of
entities, comments and processing instructions.
User Tamino, leave comments intact, no
processing instruction evaluated, leave entity
references unresolved.
User the output of a Tamino query should match
the specification of an XML processor.

26
Why dont leave entities unresolved?

In case result is a set of (parts of) matching
documents
This result DTD must include all different entity
declarations of the original document
Definition of the entity might differ from
document to document
So, for the same entity name, entities are
renamed, and the entity references are changed
accordingly.

27
problems of external entities

These entities can change without the database
system knowing about this
Thus, the values of external entities must not be
included in indexes
Example
lt!ENTITY mysubject SYSTEM
http//www.softwareag.com/hottopic.xmlgt
...
lttickergtTodays hot topic mysubjectlt/tickergt
Checking the current contents of the external
entity lead to unacceptable response times.

28
Relational Databases and XML

major (object-) relation database systems include
some forms of XML support
The simplest form is to generate XML documents
for existing relational data.
But, real database handling of XML requires that
XML data can be stored and retrieved
Two approaches

29
XML support approach(1)

Map the XML document is to relational tables and
their columns
Markup is ignored on storage, and reconstructed
on retrieval
advantage of this approach
the contents of an XML document can be handled
with traditional SQL

30
XML support approach(1) cont

Shortcomings
The sequence information lost
ltOrder CustomerId567 Date12- 12-2000gt
ltItem ProductID 17 Quantity2/gt
ltItem ProductIDl6 Quantity9/gt
ltItem ProductID 19 Quantity8/gt
lt/Ordergt
The retrieval of the order
ltOrder CustomerId567 Date12-12-2000gt
ltItem ProductID 16 Quantity9/gt
ltItem ProductID 17 Quantity2/gt
ltItem ProductID 19 Quantity8/gt
lt/Ordergt

31
XML support approach(1) cont

Data-centric documents sequence might not matter,
it does for document-centric
this approach loses all comments and processing
instructions
mixed content cannot be stored easily in this
model

32
XML support approach(2)

Leaves the XML document intact and stores it in a
large text field (BLOB)
Or even outside the database
Text search is possible
Can limit a certain text-based condition

33
XML support approach(2) cont

Limitations
no structure-aware combinations are possible
Value-based search is not supported on these text
fields
IBM solution side tables
But, direct manipulation of side tables destroys
the consistency of the database
Security can be defined on document level only,
but not on elements or attributes

34
Summary

Tamino was designed with particular attention to
the XML
Schema handling for XML is different from
relational databases does
In Schema handling, external entities cause
conceptual problems
value-based indexes are useful for XML, as well
as text index and structural index
Comments and processing instructions should be
preserved when documents are stored
The result of a query against an XML database
should be XML

35
QA