Title: Chapter 8 Storing: XML and Databases
1Chapter 8Storing XML and Databases
- From Jim Melton and Stephen Buxton, Querying
XML, Morgan Kaufmann, 2006
28.1 Introduction
- Discuss ways in which XML documents can be made
available for querying. - Ordinary computer file systems, websites,
relational database systems, XML database
systems, and other persistent storage systems - Another source of XML, streaming
- generating XML (usually dynamically) and
transmitting it to one or more clients in "real
time - Querying XML that is persistently stored offers
several advantages and challenges, while querying
streaming XML presents other advantages and
challenges. - Message queuing systems
- Data be stored in some temporary location until
it can be transmitted to its consumer, but they
rarely involve long-term persistence of the data.
38.2 The Need for Persistence
- Examples of persistent XML data
- Movie collection in XML
- corporations are increasingly likely to store
business data like purchase orders in an XML
form - many technical books are being produced from XML
sources - the W3C's specifications themselves are all coded
in XML - even computer applications' initialization and
scripting information is increasingly represented
in XML.
48.2.1 Databases
- Characteristics of DBMS
- Query tools, such as a query language like SQL or
XQuery - Transaction capabilities that include the
so-called ACID properties atomicity of
operations, consistency of the database as a
whole, isolation from other concurrent users'
operations, and durability of operations even
across system crashes - Scalability and robustness
- Management of security and performance, including
registration and management of users and their
privileges, creation of indices on the data, and
provision hints for the optimization of operations
58.2.1 Database
- Types of database commonly used to store and
manage XML data - Relational, OODB, pure XML
- XML is Unicoded
- All the three types can support
6Relational DBs
- Starting in roughly 2001, most commercial DB
venders began adding support for XML data into
their products. - Storing XML as a whole initially, and breaking
XML data down into components elements,
attributes, and other nodes for storing into
columns in various tables (shredding)
7Object-Oriented DBs
- OODBMS, a new form of DBMS, was introduced into
the marketplace. - Objects instead of tuples of attributes and
columns - In any case, we do not perceive a near-term
movement toward the use of OODBMS products for
large-scale management of XML data.
8Native XML DBs
- A DBMS that was designed specifically to deal
with semistructured data - Defines a (logical) model for an XML document
- Has an XML document as its fundamental unit of
(logical) storage - Is not required to have any particular underlying
physical storage model
98.2.2 Other Persistent Media
- XML documents are found in ordinary OS files and
on web pages. - Advantages of storing XML documents in ordinary
files - Every computer has a file system.
- Files are completely under your control
- Disadvantages
- Backing up files,
- Lack of transactional control makes data loss
more likely, and - the problems of keeping track of perhaps
thousands of XML files are quite tedious. - No way to enforce any consistent relationships
among those thousands of XML files - We believe there is a market for XML querying
tools that don't depend on the existence of a
DBMS but that search XML documents in local file
systems and across the web.
108.2.3 Shredding Your Data
- Shredding of XML documents
- Some relational database vendors provided a way
for XML documents to be broken down into their
component elements, attributes, and other nodes
for storage into columns in one or more tables. - May not preserve the integrity - the "XML-ness
of those documents. - User control of what level of XML-ness
- Reconstructing the XML documents from the
shredded fragments
118.2.3 Shredding Your Data
- The purpose of shredding is to improve the
efficiency of access to the data found in XML
documents. - Shredding might not be an appropriate way of
handling semi-structured data, like books, and
technical reports, but is more used in
data-oriented XML, like purchase order, and
personnel records. - Shredding can be done in a very naive manner,
such as defining a SQL table for each element
type in a document, with columns for each
attribute, the non-element content of those
elements, and the content of child elements that
are not allowed to have element content
themselves.
12Example 8-1 Shredding an XML Document into a
Relational Database The XML to be shredded
13The definitions of (reasonable) SQL tables into
which the shredded XML data will be placed
14Table 8-1 Result of Shredding Movies Document
158.2.3 Shredding Your Data
- Write ordinary SQL statements to query and
otherwise manipulate that data.
SELECT MAX ( runtime ) FROM movie table
SELECT givenName II ' ' II familyName FROM movie
table AS m, director table as d m WHERE
m.director id d.director id m AND m.runtime (
SELECT MAX(runtime) FROM movie_table )
168.2.3 Shredding Your Data
- It is a bit harder to do is to reconstruct the
original structure of the input. - Discover the names of the tables and columns
- Join the various tables together on their
respective PRIMARY KEY and FOREIGN KEY
relationships. - Construct the resulting XML document.
- Most vendors of shredding-capable relational
systems provide tools that reproduce the original
XML document automatically. - However, such relational systems normally aim to
preserve a data model representation of the XML
documents and not the actual sequence of
characters that may have been provided in the
serialized XML input.
178.2.3 Shredding Your Data
- The increased emphasis in all major relational
database implementations on true native XML
support, shredding is going to diminish in
popularity for most applications. - However, implementers continue to come up with
more and more sophisticated shredding techniques
targeted at a variety of usage scenarios.
188.3 SQL/XML's XML Type
- "SQL/XML," a relatively new part of the SQL
standard, designed to allow applications to
integrate their XML data and their ordinary
business data in their SQL statements - The centerpiece of SQL/XML is the creation of a
new built-in SQL type the XML type. - Logically enough, the name of the type is "XML,"
just as the type intended for storing integers is
named "INTEGER." - The design of SQL/XMLs XML type makes it a true
native-XML database type. - Therefore, if you were to create a SQL table with
a column of type XML, the values stored in that
type must be XML values, and those values retain
all of their "XML-ness.
198.3 SQL/XML's XML Type
- SQL/XML2003, the XML type was based on the XML
Information Set, - The next edition of SQL/XML replaces its use of
the Infoset with the adoption of the XQuery 1.0
and XPath 2.0 Data Model. - Implementations might choose to store serialized
XML documents and dynamically parse them into
data model instances whenever they are
referenced, or they might store some other
already-parsed representation that can be mapped
onto the data model definitions when required, or
shredding the XML data.
208.4 Accessing Persistent XML Data
- Neither XQuery nor SQL exists in a vacuum.
- Applications are typically written in one or more
other programming languages, such as C/C, Java,
and even COBOL. - Most of the conventional programming languages
(such as C and COBOL) access SQL database systems
by invoking a call-level interface such as
SQL/CLI. - Because languages like C and COBOL do not have
built-in data types for XML, all results of SQL
statements that return a value of the XML type
are implicitly cast to character string before
the result is given to the invoking program.
218.4 Accessing Persistent XML Data
- Java programs typically access SQL database
systems through the JDBC API. - The most standard way for Java programs to
access the XML data stored in SQL databases is
for them to retrieve XML data using JDBC's
getObject()method and then to cast the retrieved
object to an XML class defined in another
Java-related specification, such as JAXP. - XQJ will define a set of interfaces and classes
that enable an application to submit XQuery
queries to an XML data source and process the
results of these queries. - A direct interface from Java programs to XML data
sources without those programs having to intermix
multiple APIs, such as JDBC and JAXP.
228.5 XML on the Fly Nonpersistent XML Data
- Not all applications find it suitable to store
XML data persistently before querying it. - Stock market quotations in XML might be broadcast
to WAP-enabled cell phones that are programmed to
alert their owners whenever particular stocks
achieve a particular price. - Such data streams are literally never-ending.
- The queries are supposed to detect the specified
conditions immediately and not after periodic
store-and-query episodes.
238.5 XML on the Fly Nonpersistent XML Data
- Reasons why querying streaming XML is problematic
- A query that must retrieve the current price of
XMPL if and only if the preceding 10 trades all
increased in price - Access to an element's ancestors and preceding
siblings - Queries against streaming XML are best suited for
small XML documents and relatively simple
queries, perhaps involving a transformation of
source XML into a more desirable form of XML or
directly into HTML or even plain text. - Another form of query eminently suitable for
streaming applications is the sort that depends
solely on "very local" data.