Chapter 8 Storing: XML and Databases - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Chapter 8 Storing: XML and Databases

Description:

generating XML (usually dynamically) and transmitting it to one or more clients in 'real time' ... Objects instead of tuples of attributes and columns ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 24

Provided by: 140191

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 8 Storing: XML and Databases

1
Chapter 8Storing XML and Databases

From Jim Melton and Stephen Buxton, Querying
XML, Morgan Kaufmann, 2006

2
8.1 Introduction

Discuss ways in which XML documents can be made
available for querying.
Ordinary computer file systems, websites,
relational database systems, XML database
systems, and other persistent storage systems
Another source of XML, streaming
generating XML (usually dynamically) and
transmitting it to one or more clients in "real
time
Querying XML that is persistently stored offers
several advantages and challenges, while querying
streaming XML presents other advantages and
challenges.
Message queuing systems
Data be stored in some temporary location until
it can be transmitted to its consumer, but they
rarely involve long-term persistence of the data.

3
8.2 The Need for Persistence

Examples of persistent XML data
Movie collection in XML
corporations are increasingly likely to store
business data like purchase orders in an XML
form
many technical books are being produced from XML
sources
the W3C's specifications themselves are all coded
in XML
even computer applications' initialization and
scripting information is increasingly represented
in XML.

4
8.2.1 Databases

Characteristics of DBMS
Query tools, such as a query language like SQL or
XQuery
Transaction capabilities that include the
so-called ACID properties atomicity of
operations, consistency of the database as a
whole, isolation from other concurrent users'
operations, and durability of operations even
across system crashes
Scalability and robustness
Management of security and performance, including
registration and management of users and their
privileges, creation of indices on the data, and
provision hints for the optimization of operations

5
8.2.1 Database

Types of database commonly used to store and
manage XML data
Relational, OODB, pure XML
XML is Unicoded
All the three types can support

6
Relational DBs

Starting in roughly 2001, most commercial DB
venders began adding support for XML data into
their products.
Storing XML as a whole initially, and breaking
XML data down into components elements,
attributes, and other nodes for storing into
columns in various tables (shredding)

7
Object-Oriented DBs

OODBMS, a new form of DBMS, was introduced into
the marketplace.
Objects instead of tuples of attributes and
columns
In any case, we do not perceive a near-term
movement toward the use of OODBMS products for
large-scale management of XML data.

8
Native XML DBs

A DBMS that was designed specifically to deal
with semistructured data
Defines a (logical) model for an XML document
Has an XML document as its fundamental unit of
(logical) storage
Is not required to have any particular underlying
physical storage model

9
8.2.2 Other Persistent Media

XML documents are found in ordinary OS files and
on web pages.
Advantages of storing XML documents in ordinary
files
Every computer has a file system.
Files are completely under your control
Disadvantages
Backing up files,
Lack of transactional control makes data loss
more likely, and
the problems of keeping track of perhaps
thousands of XML files are quite tedious.
No way to enforce any consistent relationships
among those thousands of XML files
We believe there is a market for XML querying
tools that don't depend on the existence of a
DBMS but that search XML documents in local file
systems and across the web.

10
8.2.3 Shredding Your Data

Shredding of XML documents
Some relational database vendors provided a way
for XML documents to be broken down into their
component elements, attributes, and other nodes
for storage into columns in one or more tables.
May not preserve the integrity - the "XML-ness
of those documents.
User control of what level of XML-ness
Reconstructing the XML documents from the
shredded fragments

11
8.2.3 Shredding Your Data

The purpose of shredding is to improve the
efficiency of access to the data found in XML
documents.
Shredding might not be an appropriate way of
handling semi-structured data, like books, and
technical reports, but is more used in
data-oriented XML, like purchase order, and
personnel records.
Shredding can be done in a very naive manner,
such as defining a SQL table for each element
type in a document, with columns for each
attribute, the non-element content of those
elements, and the content of child elements that
are not allowed to have element content
themselves.

12
Example 8-1 Shredding an XML Document into a
Relational Database The XML to be shredded
13
The definitions of (reasonable) SQL tables into
which the shredded XML data will be placed
14
Table 8-1 Result of Shredding Movies Document
15
8.2.3 Shredding Your Data

Write ordinary SQL statements to query and
otherwise manipulate that data.

SELECT MAX ( runtime ) FROM movie table
SELECT givenName II ' ' II familyName FROM movie
table AS m, director table as d m WHERE
m.director id d.director id m AND m.runtime (
SELECT MAX(runtime) FROM movie_table )
16
8.2.3 Shredding Your Data

It is a bit harder to do is to reconstruct the
original structure of the input.
Discover the names of the tables and columns
Join the various tables together on their
respective PRIMARY KEY and FOREIGN KEY
relationships.
Construct the resulting XML document.
Most vendors of shredding-capable relational
systems provide tools that reproduce the original
XML document automatically.
However, such relational systems normally aim to
preserve a data model representation of the XML
documents and not the actual sequence of
characters that may have been provided in the
serialized XML input.

17
8.2.3 Shredding Your Data

The increased emphasis in all major relational
database implementations on true native XML
support, shredding is going to diminish in
popularity for most applications.
However, implementers continue to come up with
more and more sophisticated shredding techniques
targeted at a variety of usage scenarios.

18
8.3 SQL/XML's XML Type

"SQL/XML," a relatively new part of the SQL
standard, designed to allow applications to
integrate their XML data and their ordinary
business data in their SQL statements
The centerpiece of SQL/XML is the creation of a
new built-in SQL type the XML type.
Logically enough, the name of the type is "XML,"
just as the type intended for storing integers is
named "INTEGER."
The design of SQL/XMLs XML type makes it a true
native-XML database type.
Therefore, if you were to create a SQL table with
a column of type XML, the values stored in that
type must be XML values, and those values retain
all of their "XML-ness.

19
8.3 SQL/XML's XML Type

SQL/XML2003, the XML type was based on the XML
Information Set,
The next edition of SQL/XML replaces its use of
the Infoset with the adoption of the XQuery 1.0
and XPath 2.0 Data Model.
Implementations might choose to store serialized
XML documents and dynamically parse them into
data model instances whenever they are
referenced, or they might store some other
already-parsed representation that can be mapped
onto the data model definitions when required, or
shredding the XML data.

20
8.4 Accessing Persistent XML Data

Neither XQuery nor SQL exists in a vacuum.
Applications are typically written in one or more
other programming languages, such as C/C, Java,
and even COBOL.
Most of the conventional programming languages
(such as C and COBOL) access SQL database systems
by invoking a call-level interface such as
SQL/CLI.
Because languages like C and COBOL do not have
built-in data types for XML, all results of SQL
statements that return a value of the XML type
are implicitly cast to character string before
the result is given to the invoking program.

21
8.4 Accessing Persistent XML Data

Java programs typically access SQL database
systems through the JDBC API.
The most standard way for Java programs to
access the XML data stored in SQL databases is
for them to retrieve XML data using JDBC's
getObject()method and then to cast the retrieved
object to an XML class defined in another
Java-related specification, such as JAXP.
XQJ will define a set of interfaces and classes
that enable an application to submit XQuery
queries to an XML data source and process the
results of these queries.
A direct interface from Java programs to XML data
sources without those programs having to intermix
multiple APIs, such as JDBC and JAXP.

22
8.5 XML on the Fly Nonpersistent XML Data

Not all applications find it suitable to store
XML data persistently before querying it.
Stock market quotations in XML might be broadcast
to WAP-enabled cell phones that are programmed to
alert their owners whenever particular stocks
achieve a particular price.
Such data streams are literally never-ending.
The queries are supposed to detect the specified
conditions immediately and not after periodic
store-and-query episodes.

23
8.5 XML on the Fly Nonpersistent XML Data

Reasons why querying streaming XML is problematic
A query that must retrieve the current price of
XMPL if and only if the preceding 10 trades all
increased in price
Access to an element's ancestors and preceding
siblings
Queries against streaming XML are best suited for
small XML documents and relatively simple
queries, perhaps involving a transformation of
source XML into a more desirable form of XML or
directly into HTML or even plain text.
Another form of query eminently suitable for
streaming applications is the sort that depends
solely on "very local" data.