Tamino a DBMS Designed for XML - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Tamino a DBMS Designed for XML

Description:

Storing, managing, publishing and exchanging XML documents. Business modeling ... Within a collection, declare several document types ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 36
Provided by: protonScs
Category:
Tags: dbms | xml | designed | tamino

less

Transcript and Presenter's Notes

Title: Tamino a DBMS Designed for XML


1
Tamino a DBMS Designed for XML
  • Dr. Harald Schoning
  • Presenter Wenhui Li
  • University of Ottawa
  • Instructed by
  • Dr. Mengchi Liu
  • Carleton University

2
Abstract
  • Who?- Software AG
  • What?- XML database management system
  • When?
  • 1999 the first time unveiled
  • 2004 June Tamino XML Server 4.2
  • Why?
  • management and transfer of structured and
    unstructured data
  • completely designed for XML

3
Industry Background
  • XML is becoming prevailing for data processing in
    the internet.
  • Early goals of Tamino
  • Easy data exchanging
  • Evolution trend
  • Storing, managing, publishing and exchanging XML
    documents
  • Business modeling

4
Industry Background contXML support in
databases
  • Oracle XML Developers Kit
  • SQL Server 2000
  • DB2 XML Extender

5
Limitations of XML support via traditional RDBMS
or ORDB
  • XML is not well-structured like RDB,ORDB or OODB
  • Storing and querying XML is possible but not
    feasible in these DB systems

6
Two Modeling approaches
  • Data-centric documents
  • Regular structure
  • Order does not matter
  • No mixed content
  • Document-centric documents
  • less regular structure
  • significance of the order
  • mixed content

7
Why dont use relational DB
  • XML documents can have schematic information
    (DTD), but they are not required to.
  • classical database handling objects of a
    predefined type, cannot be applied in XML

8
Why doesnt use XML itself?
  • XML is just a markup language, it does not
    contain processing facilities on its own
  • querying a set of XML documents is outside the
    scope of the XML recommendation
  • Therefore, comes the Tamino!

9
What does Tamino do?
  • Whats Tamino (the 1st slide)
  • Store XML documents, HTML files and GIF images,
    etc.
  • Retrieve them in a set-oriented manner, with
    sophisticated query facilities

10
Taminos architecture
11
The schema of XML documents
  • XML support schematic information, but it differs
    from the classical databases
  • DTD have a couple of deficiencies (e.g. data
    type)
  • W3C working group is developing an XML schema
    description language
  • However, DTD is the only standard schema at
    present

12
XML schema vs. RDB and OODB schema
  • In RDB or OODB, the schema is created before the
    instances can be stored
  • Instances must conform to the declared schema
  • In XML database, each instance declares a schema
    on its own.
  • for XML documents, grouping of objects of
    homogeneous structure into (pre-defined) tables
    or classes doesnt work

13
Query and Index of XML schema
  • Queries operate on sets
  • Indexes are defined on the basis of a common
    schema
  • For the purpose of querying, arbitrary objects
    could be grouped to sets
  • Index definition also requires at least a common
    subset in the structure

14
Schema handling in Tamino
  • Grouping documents by open content model
    user-directed document grouping
  • Documents grouped into collections
  • Within a collection, declare several document
    types
  • For each document type define a common schema
    (open content model)
  • For each document, Tamino assigns one of the
    document type

15
Type Assignment
  • Assignment is based on the root element type
  • Document must match the schema of the document
    type assigned, but might have additional
    elements/attributes
  • In a document type, documents might differ
    considerably
  • If no appropriate document type, document is
    stored without any schema checking

16
Tamino schema example
17
Document accepted by Tamino
  • ltCity Inhabitants138000gt
  • ltNamegtDarmstartlt/Namegt
  • ltAdditiongtThe city of art nouveaudlt/Addtiongt
  • ltMonument Height39mgt
  • ltNamegtLanger Ludwiglt/Namegt
  • ltLocationgt
  • ltNamegtLuisenplatzltNamegt
  • ltMapIndexgtM5lt/MapIndexgt
  • lt/Locationgt
  • lt/Monumentgt
  • lt/Citygt

18
Is an element/attribute should be modeled?
  • an index will be defined on this
    element/attribute
  • the element/attribute is to be mapped to an
    external data source or to a server extension
  • dedicated access rights will be defined on the
    element/attribute
  • the presence / multiplicity of the element is to
    be enforced
  • one of the above conditions hold for a child of
    the element

19
Indexing of Tamino
  • value-based indexes
  • well known from traditional database systems
  • used to accelerate the search
  • exactly address the data object
  • names need not be unique within a DTD

20
Example of value-based index
  • value-based indexes
  • data-centric view
  • lt!ELEMENT City (Name, Inhabitants, Monument)gt
  • lt!ELEMENT Monument (Name, Description)gt
  • lt!ELEMENT Inhabitants (PCDATA)gt
  • lt!ELEMENT Name (PCDATA)gt
  • lt!ELEMENT Description (PCDATA)gt

21
Indexing of Tamino (cont)
  • text indexing
  • document-centric view
  • limit the scope to a specific part of the
    document
  • the scope might span element content

22
Example of text index
  • text indexing
  • document-centric view
  • ltstatementgt
  • ltauthorgt
  • ltfirstnamegtHaraldgtlt/firstnamegt
  • ltlastnamegtSchoninglt/lastnamegt
  • lt/authorgt
  • lttextgt
  • XltitalicgtMlt/italicgtL and XltitalicgtSlt/italicgtL
  • are ltstressedgtverylt/stressedgt important
  • lt/textgt
  • lt/speechgt

23
Indexing of Tamino (cont)
  • structural index
  • If multiplicity permits the omission of elements
  • or if no DTD is known
  • Example
  • in a database of all European cities
  • search all those cities which have an element
    called beach

24
Querying XML documents
  • Currently, there is no standardized query
    language
  • XPath allows positioning within a single document
  • XPath fits well the needs of retrieval in
    data-centric environments
  • document-centric environments need a more
    content-based retrieval facility
  • Tamino also supports full text search

25
Expectation for XML processor
  • W3CXML recommendation specifies the handling of
    entities, comments and processing instructions.
  • User Tamino, leave comments intact, no
    processing instruction evaluated, leave entity
    references unresolved.
  • User the output of a Tamino query should match
    the specification of an XML processor.

26
Why dont leave entities unresolved?
  • In case result is a set of (parts of) matching
    documents
  • This result DTD must include all different entity
    declarations of the original document
  • Definition of the entity might differ from
    document to document
  • So, for the same entity name, entities are
    renamed, and the entity references are changed
    accordingly.

27
problems of external entities
  • These entities can change without the database
    system knowing about this
  • Thus, the values of external entities must not be
    included in indexes
  • Example
  • lt!ENTITY mysubject SYSTEM
  • http//www.softwareag.com/hottopic.xmlgt
  • ...
  • lttickergtTodays hot topic mysubjectlt/tickergt
  • Checking the current contents of the external
    entity lead to unacceptable response times.

28
Relational Databases and XML
  • major (object-) relation database systems include
    some forms of XML support
  • The simplest form is to generate XML documents
    for existing relational data.
  • But, real database handling of XML requires that
    XML data can be stored and retrieved
  • Two approaches

29
XML support approach(1)
  • Map the XML document is to relational tables and
    their columns
  • Markup is ignored on storage, and reconstructed
    on retrieval
  • advantage of this approach
  • the contents of an XML document can be handled
    with traditional SQL

30
XML support approach(1) cont
  • Shortcomings
  • The sequence information lost
  • ltOrder CustomerId567 Date12- 12-2000gt
  • ltItem ProductID 17 Quantity2/gt
  • ltItem ProductIDl6 Quantity9/gt
  • ltItem ProductID 19 Quantity8/gt
  • lt/Ordergt
  • The retrieval of the order
  • ltOrder CustomerId567 Date12-12-2000gt
  • ltItem ProductID 16 Quantity9/gt
  • ltItem ProductID 17 Quantity2/gt
  • ltItem ProductID 19 Quantity8/gt
  • lt/Ordergt

31
XML support approach(1) cont
  • Data-centric documents sequence might not matter,
    it does for document-centric
  • this approach loses all comments and processing
    instructions
  • mixed content cannot be stored easily in this
    model

32
XML support approach(2)
  • Leaves the XML document intact and stores it in a
    large text field (BLOB)
  • Or even outside the database
  • Text search is possible
  • Can limit a certain text-based condition

33
XML support approach(2) cont
  • Limitations
  • no structure-aware combinations are possible
  • Value-based search is not supported on these text
    fields
  • IBM solution side tables
  • But, direct manipulation of side tables destroys
    the consistency of the database
  • Security can be defined on document level only,
    but not on elements or attributes

34
Summary
  • Tamino was designed with particular attention to
    the XML
  • Schema handling for XML is different from
    relational databases does
  • In Schema handling, external entities cause
    conceptual problems
  • value-based indexes are useful for XML, as well
    as text index and structural index
  • Comments and processing instructions should be
    preserved when documents are stored
  • The result of a query against an XML database
    should be XML

35
QA
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com