Title: Tamino a DBMS Designed for XML
1Tamino a DBMS Designed for XML
- Dr. Harald Schoning
- Presenter Wenhui Li
- University of Ottawa
- Instructed by
- Dr. Mengchi Liu
- Carleton University
2Abstract
- Who?- Software AG
- What?- XML database management system
- When?
- 1999 the first time unveiled
- 2004 June Tamino XML Server 4.2
- Why?
- management and transfer of structured and
unstructured data - completely designed for XML
3Industry Background
- XML is becoming prevailing for data processing in
the internet. - Early goals of Tamino
- Easy data exchanging
- Evolution trend
- Storing, managing, publishing and exchanging XML
documents - Business modeling
4Industry Background contXML support in
databases
- Oracle XML Developers Kit
- SQL Server 2000
- DB2 XML Extender
5Limitations of XML support via traditional RDBMS
or ORDB
- XML is not well-structured like RDB,ORDB or OODB
- Storing and querying XML is possible but not
feasible in these DB systems
6Two Modeling approaches
- Data-centric documents
- Regular structure
- Order does not matter
- No mixed content
- Document-centric documents
- less regular structure
- significance of the order
- mixed content
7Why dont use relational DB
- XML documents can have schematic information
(DTD), but they are not required to. - classical database handling objects of a
predefined type, cannot be applied in XML
8Why doesnt use XML itself?
- XML is just a markup language, it does not
contain processing facilities on its own - querying a set of XML documents is outside the
scope of the XML recommendation - Therefore, comes the Tamino!
9What does Tamino do?
- Whats Tamino (the 1st slide)
- Store XML documents, HTML files and GIF images,
etc. - Retrieve them in a set-oriented manner, with
sophisticated query facilities
10Taminos architecture
11The schema of XML documents
- XML support schematic information, but it differs
from the classical databases - DTD have a couple of deficiencies (e.g. data
type) - W3C working group is developing an XML schema
description language - However, DTD is the only standard schema at
present
12XML schema vs. RDB and OODB schema
- In RDB or OODB, the schema is created before the
instances can be stored - Instances must conform to the declared schema
- In XML database, each instance declares a schema
on its own. - for XML documents, grouping of objects of
homogeneous structure into (pre-defined) tables
or classes doesnt work
13Query and Index of XML schema
- Queries operate on sets
- Indexes are defined on the basis of a common
schema - For the purpose of querying, arbitrary objects
could be grouped to sets - Index definition also requires at least a common
subset in the structure
14Schema handling in Tamino
- Grouping documents by open content model
user-directed document grouping - Documents grouped into collections
- Within a collection, declare several document
types - For each document type define a common schema
(open content model) - For each document, Tamino assigns one of the
document type
15Type Assignment
- Assignment is based on the root element type
- Document must match the schema of the document
type assigned, but might have additional
elements/attributes - In a document type, documents might differ
considerably - If no appropriate document type, document is
stored without any schema checking
16Tamino schema example
17Document accepted by Tamino
- ltCity Inhabitants138000gt
- ltNamegtDarmstartlt/Namegt
- ltAdditiongtThe city of art nouveaudlt/Addtiongt
- ltMonument Height39mgt
- ltNamegtLanger Ludwiglt/Namegt
- ltLocationgt
- ltNamegtLuisenplatzltNamegt
- ltMapIndexgtM5lt/MapIndexgt
- lt/Locationgt
- lt/Monumentgt
- lt/Citygt
18Is an element/attribute should be modeled?
- an index will be defined on this
element/attribute - the element/attribute is to be mapped to an
external data source or to a server extension - dedicated access rights will be defined on the
element/attribute - the presence / multiplicity of the element is to
be enforced - one of the above conditions hold for a child of
the element
19Indexing of Tamino
- value-based indexes
- well known from traditional database systems
- used to accelerate the search
- exactly address the data object
- names need not be unique within a DTD
20Example of value-based index
- value-based indexes
- data-centric view
- lt!ELEMENT City (Name, Inhabitants, Monument)gt
- lt!ELEMENT Monument (Name, Description)gt
- lt!ELEMENT Inhabitants (PCDATA)gt
- lt!ELEMENT Name (PCDATA)gt
- lt!ELEMENT Description (PCDATA)gt
21Indexing of Tamino (cont)
- text indexing
- document-centric view
- limit the scope to a specific part of the
document - the scope might span element content
22Example of text index
- text indexing
- document-centric view
- ltstatementgt
- ltauthorgt
- ltfirstnamegtHaraldgtlt/firstnamegt
- ltlastnamegtSchoninglt/lastnamegt
- lt/authorgt
- lttextgt
- XltitalicgtMlt/italicgtL and XltitalicgtSlt/italicgtL
- are ltstressedgtverylt/stressedgt important
- lt/textgt
- lt/speechgt
23Indexing of Tamino (cont)
- structural index
- If multiplicity permits the omission of elements
- or if no DTD is known
- Example
- in a database of all European cities
- search all those cities which have an element
called beach
24Querying XML documents
- Currently, there is no standardized query
language - XPath allows positioning within a single document
- XPath fits well the needs of retrieval in
data-centric environments - document-centric environments need a more
content-based retrieval facility - Tamino also supports full text search
25Expectation for XML processor
- W3CXML recommendation specifies the handling of
entities, comments and processing instructions. - User Tamino, leave comments intact, no
processing instruction evaluated, leave entity
references unresolved. - User the output of a Tamino query should match
the specification of an XML processor.
26Why dont leave entities unresolved?
- In case result is a set of (parts of) matching
documents - This result DTD must include all different entity
declarations of the original document - Definition of the entity might differ from
document to document - So, for the same entity name, entities are
renamed, and the entity references are changed
accordingly.
27 problems of external entities
- These entities can change without the database
system knowing about this - Thus, the values of external entities must not be
included in indexes - Example
- lt!ENTITY mysubject SYSTEM
- http//www.softwareag.com/hottopic.xmlgt
- ...
- lttickergtTodays hot topic mysubjectlt/tickergt
- Checking the current contents of the external
entity lead to unacceptable response times.
28Relational Databases and XML
- major (object-) relation database systems include
some forms of XML support - The simplest form is to generate XML documents
for existing relational data. - But, real database handling of XML requires that
XML data can be stored and retrieved - Two approaches
29XML support approach(1)
- Map the XML document is to relational tables and
their columns - Markup is ignored on storage, and reconstructed
on retrieval - advantage of this approach
- the contents of an XML document can be handled
with traditional SQL
30XML support approach(1) cont
- Shortcomings
- The sequence information lost
- ltOrder CustomerId567 Date12- 12-2000gt
- ltItem ProductID 17 Quantity2/gt
- ltItem ProductIDl6 Quantity9/gt
- ltItem ProductID 19 Quantity8/gt
- lt/Ordergt
- The retrieval of the order
- ltOrder CustomerId567 Date12-12-2000gt
- ltItem ProductID 16 Quantity9/gt
- ltItem ProductID 17 Quantity2/gt
- ltItem ProductID 19 Quantity8/gt
- lt/Ordergt
-
31XML support approach(1) cont
- Data-centric documents sequence might not matter,
it does for document-centric - this approach loses all comments and processing
instructions - mixed content cannot be stored easily in this
model
32XML support approach(2)
- Leaves the XML document intact and stores it in a
large text field (BLOB) - Or even outside the database
- Text search is possible
- Can limit a certain text-based condition
33XML support approach(2) cont
- Limitations
- no structure-aware combinations are possible
- Value-based search is not supported on these text
fields - IBM solution side tables
- But, direct manipulation of side tables destroys
the consistency of the database - Security can be defined on document level only,
but not on elements or attributes
34Summary
- Tamino was designed with particular attention to
the XML - Schema handling for XML is different from
relational databases does - In Schema handling, external entities cause
conceptual problems - value-based indexes are useful for XML, as well
as text index and structural index - Comments and processing instructions should be
preserved when documents are stored - The result of a query against an XML database
should be XML
35QA