Chapter 8 Storing: XML and Databases - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Chapter 8 Storing: XML and Databases

Description:

generating XML (usually dynamically) and transmitting it to one or more clients in 'real time' ... Objects instead of tuples of attributes and columns ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 24
Provided by: 140191
Category:

less

Transcript and Presenter's Notes

Title: Chapter 8 Storing: XML and Databases


1
Chapter 8Storing XML and Databases
  • From Jim Melton and Stephen Buxton, Querying
    XML, Morgan Kaufmann, 2006

2
8.1 Introduction
  • Discuss ways in which XML documents can be made
    available for querying.
  • Ordinary computer file systems, websites,
    relational database systems, XML database
    systems, and other persistent storage systems
  • Another source of XML, streaming
  • generating XML (usually dynamically) and
    transmitting it to one or more clients in "real
    time
  • Querying XML that is persistently stored offers
    several advantages and challenges, while querying
    streaming XML presents other advantages and
    challenges.
  • Message queuing systems
  • Data be stored in some temporary location until
    it can be transmitted to its consumer, but they
    rarely involve long-term persistence of the data.

3
8.2 The Need for Persistence
  • Examples of persistent XML data
  • Movie collection in XML
  • corporations are increasingly likely to store
    business data like purchase orders in an XML
    form
  • many technical books are being produced from XML
    sources
  • the W3C's specifications themselves are all coded
    in XML
  • even computer applications' initialization and
    scripting information is increasingly represented
    in XML.

4
8.2.1 Databases
  • Characteristics of DBMS
  • Query tools, such as a query language like SQL or
    XQuery
  • Transaction capabilities that include the
    so-called ACID properties atomicity of
    operations, consistency of the database as a
    whole, isolation from other concurrent users'
    operations, and durability of operations even
    across system crashes
  • Scalability and robustness
  • Management of security and performance, including
    registration and management of users and their
    privileges, creation of indices on the data, and
    provision hints for the optimization of operations

5
8.2.1 Database
  • Types of database commonly used to store and
    manage XML data
  • Relational, OODB, pure XML
  • XML is Unicoded
  • All the three types can support

6
Relational DBs
  • Starting in roughly 2001, most commercial DB
    venders began adding support for XML data into
    their products.
  • Storing XML as a whole initially, and breaking
    XML data down into components elements,
    attributes, and other nodes for storing into
    columns in various tables (shredding)

7
Object-Oriented DBs
  • OODBMS, a new form of DBMS, was introduced into
    the marketplace.
  • Objects instead of tuples of attributes and
    columns
  • In any case, we do not perceive a near-term
    movement toward the use of OODBMS products for
    large-scale management of XML data.

8
Native XML DBs
  • A DBMS that was designed specifically to deal
    with semistructured data
  • Defines a (logical) model for an XML document
  • Has an XML document as its fundamental unit of
    (logical) storage
  • Is not required to have any particular underlying
    physical storage model

9
8.2.2 Other Persistent Media
  • XML documents are found in ordinary OS files and
    on web pages.
  • Advantages of storing XML documents in ordinary
    files
  • Every computer has a file system.
  • Files are completely under your control
  • Disadvantages
  • Backing up files,
  • Lack of transactional control makes data loss
    more likely, and
  • the problems of keeping track of perhaps
    thousands of XML files are quite tedious.
  • No way to enforce any consistent relationships
    among those thousands of XML files
  • We believe there is a market for XML querying
    tools that don't depend on the existence of a
    DBMS but that search XML documents in local file
    systems and across the web.

10
8.2.3 Shredding Your Data
  • Shredding of XML documents
  • Some relational database vendors provided a way
    for XML documents to be broken down into their
    component elements, attributes, and other nodes
    for storage into columns in one or more tables.
  • May not preserve the integrity - the "XML-ness
    of those documents.
  • User control of what level of XML-ness
  • Reconstructing the XML documents from the
    shredded fragments

11
8.2.3 Shredding Your Data
  • The purpose of shredding is to improve the
    efficiency of access to the data found in XML
    documents.
  • Shredding might not be an appropriate way of
    handling semi-structured data, like books, and
    technical reports, but is more used in
    data-oriented XML, like purchase order, and
    personnel records.
  • Shredding can be done in a very naive manner,
    such as defining a SQL table for each element
    type in a document, with columns for each
    attribute, the non-element content of those
    elements, and the content of child elements that
    are not allowed to have element content
    themselves.

12
Example 8-1 Shredding an XML Document into a
Relational Database The XML to be shredded
13
The definitions of (reasonable) SQL tables into
which the shredded XML data will be placed
14
Table 8-1 Result of Shredding Movies Document
15
8.2.3 Shredding Your Data
  • Write ordinary SQL statements to query and
    otherwise manipulate that data.

SELECT MAX ( runtime ) FROM movie table
SELECT givenName II ' ' II familyName FROM movie
table AS m, director table as d m WHERE
m.director id d.director id m AND m.runtime (
SELECT MAX(runtime) FROM movie_table )
16
8.2.3 Shredding Your Data
  • It is a bit harder to do is to reconstruct the
    original structure of the input.
  • Discover the names of the tables and columns
  • Join the various tables together on their
    respective PRIMARY KEY and FOREIGN KEY
    relationships.
  • Construct the resulting XML document.
  • Most vendors of shredding-capable relational
    systems provide tools that reproduce the original
    XML document automatically.
  • However, such relational systems normally aim to
    preserve a data model representation of the XML
    documents and not the actual sequence of
    characters that may have been provided in the
    serialized XML input.

17
8.2.3 Shredding Your Data
  • The increased emphasis in all major relational
    database implementations on true native XML
    support, shredding is going to diminish in
    popularity for most applications.
  • However, implementers continue to come up with
    more and more sophisticated shredding techniques
    targeted at a variety of usage scenarios.

18
8.3 SQL/XML's XML Type
  • "SQL/XML," a relatively new part of the SQL
    standard, designed to allow applications to
    integrate their XML data and their ordinary
    business data in their SQL statements
  • The centerpiece of SQL/XML is the creation of a
    new built-in SQL type the XML type.
  • Logically enough, the name of the type is "XML,"
    just as the type intended for storing integers is
    named "INTEGER."
  • The design of SQL/XMLs XML type makes it a true
    native-XML database type.
  • Therefore, if you were to create a SQL table with
    a column of type XML, the values stored in that
    type must be XML values, and those values retain
    all of their "XML-ness.

19
8.3 SQL/XML's XML Type
  • SQL/XML2003, the XML type was based on the XML
    Information Set,
  • The next edition of SQL/XML replaces its use of
    the Infoset with the adoption of the XQuery 1.0
    and XPath 2.0 Data Model.
  • Implementations might choose to store serialized
    XML documents and dynamically parse them into
    data model instances whenever they are
    referenced, or they might store some other
    already-parsed representation that can be mapped
    onto the data model definitions when required, or
    shredding the XML data.

20
8.4 Accessing Persistent XML Data
  • Neither XQuery nor SQL exists in a vacuum.
  • Applications are typically written in one or more
    other programming languages, such as C/C, Java,
    and even COBOL.
  • Most of the conventional programming languages
    (such as C and COBOL) access SQL database systems
    by invoking a call-level interface such as
    SQL/CLI.
  • Because languages like C and COBOL do not have
    built-in data types for XML, all results of SQL
    statements that return a value of the XML type
    are implicitly cast to character string before
    the result is given to the invoking program.

21
8.4 Accessing Persistent XML Data
  • Java programs typically access SQL database
    systems through the JDBC API.
  • The most standard way for Java programs to
    access the XML data stored in SQL databases is
    for them to retrieve XML data using JDBC's
    getObject()method and then to cast the retrieved
    object to an XML class defined in another
    Java-related specification, such as JAXP.
  • XQJ will define a set of interfaces and classes
    that enable an application to submit XQuery
    queries to an XML data source and process the
    results of these queries.
  • A direct interface from Java programs to XML data
    sources without those programs having to intermix
    multiple APIs, such as JDBC and JAXP.

22
8.5 XML on the Fly Nonpersistent XML Data
  • Not all applications find it suitable to store
    XML data persistently before querying it.
  • Stock market quotations in XML might be broadcast
    to WAP-enabled cell phones that are programmed to
    alert their owners whenever particular stocks
    achieve a particular price.
  • Such data streams are literally never-ending.
  • The queries are supposed to detect the specified
    conditions immediately and not after periodic
    store-and-query episodes.

23
8.5 XML on the Fly Nonpersistent XML Data
  • Reasons why querying streaming XML is problematic
  • A query that must retrieve the current price of
    XMPL if and only if the preceding 10 trades all
    increased in price
  • Access to an element's ancestors and preceding
    siblings
  • Queries against streaming XML are best suited for
    small XML documents and relatively simple
    queries, perhaps involving a transformation of
    source XML into a more desirable form of XML or
    directly into HTML or even plain text.
  • Another form of query eminently suitable for
    streaming applications is the sort that depends
    solely on "very local" data.
Write a Comment
User Comments (0)
About PowerShow.com