Introduction to Databases: Relational and XML Models and Languages

1 / 56
About This Presentation
Title:

Introduction to Databases: Relational and XML Models and Languages

Description:

Attribute: 'atomic' content, applying to the whole element. Content (Subelement): otherwise ... for describing XML documents (tags, attributes, nesting, etc. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 57
Provided by: bent83

less

Transcript and Presenter's Notes

Title: Introduction to Databases: Relational and XML Models and Languages


1
Introduction to DatabasesRelational and XML
Models and Languages
  • Instructors
  • Bertram Ludaescher
  • Kai Lin

2
Overview (Part 2)
  • 0915-1020 Relational Databases (1h05)
  • 1020-1030 BREAK (10)
  • 1030-1150 Relational Databases (1h20)
  • 1150-1315 LUNCH (1h25)
  • 1315-1345 Demo Hands-on (30)
  • 1345-1510 XML Basics (1h25)
  • 1510-1530 BREAK (20)
  • 1530-1630 XML Querying (1h)
  • 1630-1700 Demo Hands-on (30)

3
XML and Related Standards
  • An introduction to XML, DTDs, XML Schema, and the
    DOM
  • includes material by
  • Shawn Bowers, SDSC
  • Michael Gertz, UC Davis

4
(No Transcript)
5
A Neuroscientists Information Integration Problem
Biomedical Informatics Research
Network http//nbirn.net
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
Complex Multiple-Worlds Mediation
6
A Home Buyers Information Integration Problem
What houses for sale under 500k have at least 2
bathrooms, 2 bedrooms, a nearby school ranking
in the upper third, in a neighborhood with
below-average crime rate and diverse population?
? Information Integration
Multiple-Worlds Mediation
7
An Online Shoppers Information Integration
Problem
El Cheapo Where can I get the cheapest copy
(including shipping cost) of Wittgensteins
Tractatus Logicus-Philosophicus within a week?
One-World Mediation
Mediator (virtual DB) (vs. Datawarehouse)
8
Information Integration Challenges
  • System aspects Grid Middleware
  • distributed data computing
  • Web Services, WSDL/SOAP, OGSA,
  • sources functions, files, data sets
  • Syntax Structure
  • (XML-Based) Data Mediators
  • wrapping, restructuring
  • (XML) queries and views
  • sources (XML) databases
  • Semantics
  • Model-Based/Semantic Mediators
  • conceptual models and declarative views
  • Knowledge Representation ontologies, description
    logics (RDF(S),OWL ...)
  • sources knowledge bases (DBCMsICs)

9
Information Integration Challenges S4
Heterogeneities
  • Systems Integration
  • platforms, devices, data service distribution,
    APIs, protocols,
  • ? Grid middleware technologies
  • e.g. single sign-on, platform independence,
    transparent use of remote resources,
  • Syntax Structure
  • heterogeneous data formats (one for each tool
    ...)
  • heterogeneous data models (RDBs, ORDBs, OODBs,
    XMLDBs, flat files, )
  • heterogeneous schemas (one for each DB ...)
  • ? Database mediation technologies
  • XML-based data exchange, integrated views,
    transparent query rewriting,
  • Semantics
  • fuzzy metadata, terminology, hidden semantics,
    implicit assumptions,
  • ? Knowledge representation semantic mediation
    technologies
  • smart data discovery integration
  • e.g. ask about X (mafic) find data about Y
    (diorite) be happy anyways!

10
Structural / XML-Based Mediation
11
Information Integration from a DB Perspective
  • Information Integration Problem
  • Given data sources S1, ..., Sk (DBMS, web sites,
    ...) and user questions Q1,..., Qn that can be
    answered using the Si
  • Find the answers to Q1, ..., Qn
  • The Database Perspective source database
  • Si has a schema (relational, XML, OO, ...)
  • Si can be queried
  • define virtual (or materialized)
    integrated/global view G over S1 ,..., Sk using
    database query languages (SQL, XQuery,...)
  • questions become queries Qi against G(S1,..., Sk)

12
Standard (XML-Based) Mediator Architecture
USER/Client
3. Q1 Q2 Q3
4. answers(Q1)
answers(Q2) answers(Q3)
13
Query Planning for Mediators
  • Given
  • User query Q answer() ? G ...
  • G ? S global-as-view (GAV)
  • S ? G local-as-view (LAV)
  • ic() ? S G integrity constraints
    (ICs)
  • Find
  • equivalent (or min. containing, max.contained)
    query plan Q answer() ? S
  • Results
  • A variety of results/algorithms depending on
    classes of queries, views, and ICs P, NP,,
    undecidable
  • many variants still open

14
Background
  • Markup
  • Annotations (tags) for carrying information about
    a documents content
  • a writers handwritten notes for typesetting
  • an editors corrections in a manuscript
  • A Markup Language defines a syntax and grammar
    for tags

15
Background (contd)
  • SGML
  • Standard Generalized Markup Language
  • Standardized in 1986 (ISO)
  • A language for defining markup languages
  • And for marking-up content
  • Syntax Document Type Definition (DTD)
  • Tools aimed at document management

16
Background (contd)
  • HTML
  • A markup language
  • A particular SGML Document Type (called an
    application)
  • Tools for browsing and authoring

17
Background (contd)
  • Limitations
  • SGML
  • Complex, many options and shortcuts
  • Must know the DTD to parse correctly
  • Cost of SGML technology is high
  • HTML
  • Not extensiblecant define new tags
  • Tags for presenting data not describing it
  • Doesnt capture much document structure or
    content meaning

18
Enter XML
  • XML (Extensible Markup Language)
  • Standardized by W3C in 1998
  • For data interchange over the Web
  • A Simpler SGML
  • Actually, a subset of SGML
  • DTDs are optional
  • Less features and options
  • Widely available tools for parsing, authoring,
    browsing, etc.

19
Uses for XML
  • Why XML?
  • Capture logical structure of documents
  • Presentation Independent
  • Data Interchange
  • XML is implementation independent
  • Storage Format
  • Maiers Maxim Any successful interchange format
    becomes a storage format
  • Metadata
  • Searching, filtering, organizing
  • Data Packaging, Movement, and Processing
  • Client-Side processing, Server-to-Server
    communication, Non-browser based clients,
    Simplified Server Processing, etc.

20
(No Transcript)
21
(Some of) The Many Standards of XML
Schema and Types XML Schema and XML data types
Programming Document Object Model (DOM) - API to
XML documents
Query XQuery, XQL, XML-QL
XML DTD
Transformation XSLT for rearranging and
restructuring XML documents
Addressing XPath and Xpointer for addressing XML
subdocuments
XML Document
Transport XML-RPC, SOAP, XML-Protocol for
message and object serialization and remote
procedure calls
Metadata RDF - using XML to define resource
metadata
Linking XLink for simple and complex hyperlinks
between XML Documents
22
The Running Example
  • Lego Product Catalogs
  • catalogs have
  • a publishing date, an identifier, a title, etc.
  • catalogs are made up of products
  • either a kit or accessory
  • each has an item , price, name, picture, etc.
  • kits can have
  • an age level, of pieces, set type (duplo,
    basic), a theme (star wars), a system (space)

23
An Example XML Catalog Document
lt?xml version1.0?gt ltLegoCataloggt
ltpubDategt2000lt/pubDategt ltproductsgt
ltkitgt ltnamegtX-Wing Fighterlt/namegt
ltagesgt ltminAgegt7lt/minAgegt
ltmaxAgegt12lt/maxAgegt
lt/agesgt ltpiecesgt263lt/piecesgt
ltthemegtStar Warslt/themegt
ltdescgtTake to the skies with Luke
as he battles the forces of evil! lt/descgt
lt/kitgt lt/productsgt lt/LegoCataloggt
24
An Example XML Document
prolog
lt?xml version1.0?gt ltLegoCataloggt
ltpubDategt2000lt/pubDategt ltproductsgt
ltkitgt ltnamegtX-Wing Fighterlt/namegt
ltagesgt ltminAgegt7lt/minAgegt
ltmaxAgegt12lt/maxAgegt
lt/agesgt ltpiecesgt263lt/piecesgt
ltthemegtStar Warslt/themegt
ltdescgtTake to the skies with Luke
as he battles the forces of evil!
lt/descgt lt/kitgt lt/productsgt lt/LegoCata
loggt
elements have start and end-tags
elements are nested boxes within boxes
body
elements can also contain content
25
Well Formed Documents
  • Well-formed XML documents
  • A single root element
  • Start and end tags required (unlike HTML)
  • ltnamegtX-Wing Fighterlt/namegt
  • empty-element tags lttheme/gt
  • Elements must be properly nested
  • ltkitgtltpiecesgt263lt/kitgtlt/piecesgt
  • More rules
  • naming elements, document has at least one
    element, etc.

This is NOT properly nested!!!
26
XML Attributes
  • Elements can contain attributes
  • ltkit unitId7140 price29.99
    shipWeight1lbgt

element name
attribute name
attribute name
attribute name
attribute value
attribute value
attribute value
Attributes are always assigned in element start
tags, are always surrounded by double quotes, and
must be unique in the element
27
Attributes vs. Content
  • In general, it is up to the document designer
  • In SGML, content usually was for data you see and
    attributes for metadata
  • how I do it
  • Attribute atomic content, applying to the
    whole element
  • Content (Subelement) otherwise

28
Document Type Definition
  • Why DTDs?
  • To standardize tags and structure for interchange
    and creation
  • To make the documents machine processable
  • What is a DTD?
  • A grammar for describing XML documents (tags,
    attributes, nesting, etc.)
  • An XML document that is well-formed and conforms
    to a DTD is said to be valid

29
An Example DTD Elements
An element content model for LegoCatalog
lt!ELEMENT LegoCatalog (pubDate,
products)gt lt!ELEMENT pubDate (PCDATA)gt lt!ELEMEN
T products (kit accessory)gt lt!ELEMENT kit
(name, ages, pieces, theme?,
series?, desc)gt lt!ELEMENT ages (minAge,
maxAge)gt lt!ELEMENT minAge (PCDATA)gt lt!ELEMENT
maxAge (PCDATA)gt lt!ELEMENT pieces
(PCDATA)gt lt!ELEMENT series (PCDATA)gt lt!ELEMENT
desc (PCDATA)gt
A character data content model for pubDate
zero or more one or more ? optional
Choice , Strict Sequence () Grouping
Empty, Any, and Mixed content models
30
An Example DTD Attributes
lt!ATTLIST kit price CDATA REQUIRED
shipWeight CDATA REQUIRED avail (yes
no) IMPLIED image CDATA na.jpg
unitId ID IMPLIED gt lt!ATTLIST accessory
forKits IDREFS IMPLIED orderStatus CDATA
FIXED special gt
each attribute has the form attr-name type
default-decl
CDATA character data ID unique
identifier IDREF reference to an ID IDREFS
list of references enumeration list of possible
values
REQUIRED must appear IMPLIED optionally
appear FIXED default if attribute is
missing, parser assumes value Default only
if attribute is missing, default is
assumed, otherwise any value
31
Limitations of DTDs
  • DTDs are not optimal
  • Not well-formed XML
  • cant parse them with an XML parser
  • need different tools to create them
  • but at least you can sort-of read/understand
    them (try XML Schema -)
  • Limited support for defining data types
  • Limited modeling capabilities
  • hard to express some structures
  • no support for reusing structure

32
Enter XML Schema
  • XML Schema
  • W3C proposed recommendation (2001)
  • Divided into 2 parts structures, datatypes
  • Main features
  • Well-formed XML documents
  • A schema can span multiple documents
  • Can define new data types and constraints
  • Inheritance among content model types
  • Improves data interchange
  • Offers more precision for computer-computer
    transfer

33
Example XML Schema
ltschemagt ltelement nameproductsgt
ltcomplexTypegt ltsequencegt ltelement
namekit typeProduct minOccurs1
maxOccursunbounded/gt ltelement
nameaccessory typeProduct minOccurs0
maxOccursunbounded/gt
... lt/elementgt ltcomplexType
nameProductgt ltattribute nameprice
typeDollarType/gt lt/complexTypegt
ltsimpleType nameDollarTypegt ltpattern
valuereg-exp/gt ...
ComplexType Content Model
Many ways to describe new data types (not just
regular expressions)
34
XML Schema User-Defined Type/Class Hierarchy
Time to Leave the Trees From Syntactic to
Conceptual Querying of XML, B. Ludäscher, I.
Altintas, A. Gupta, Intl. Workshop on XML Data
Management (XMLDM), Prague, Czech Republic, March
2002, LNCS 2490, Springer
35
XML Schema Declarations (home-style syntax)
Complex Type Declarations
36
XML Schema (home-style)
Simple Type Declarations
Complex Types
37
Programming with XML
  • The DOM (document object model)
  • Maintained by the W3C
  • Language and platform independent
  • An object model for XML (actually, an API)
  • core, views, events, style, persistence, etc.

output
XML
creates manipulates
Application
Parser
generates
accesses
DOM objects
38
DOM Example
ltLegoCataloggt ltkit price29.99gt
Take to the skies ... lt/kitgt lt/LegoCataloggt
Document Node
d.load()
Document Root
NodeList
Element Node
ln d.documentElement
ltLegoCataloggt
NOTE I left off the desc element and just placed
its content under kit.
NodeList
lnl ln.childNodes
Element Node
ltkitgt
kn lnl.item(0)
Named Node Map
knm kn.attributes
Attr Node
ka knm.item(0)
pieces263
NodeList
knl kn.childNodes
Text Char. Data Node
knl knl.item(0)
Take to the skies ...
39
XML Query Languages
  • XPath
  • /order//books/bookcover_stylepaperbackpr
    icelt80
  • XQuery
  • the W3C XML query language
  • XSLT
  • XML transformations (XMLgtHTML, XMLgtXML)
  • ...

40
XPath
41
Example
42
XSLT Processing Model
43
XSLT Elements
  • ltxslstylesheet version"1.0" xmlnsxsl"http//ww
    w.w3.org/1999/XSL/Transform"gt
  • root element of an XSLT stylesheet "program"
  • ltxsltemplate matchpattern nameqname
    prioritynumber modeqnamegt
  • ...template...
  • lt/xsltemplategt
  • declares a rule (pattern gt template)
  • ltxslapply-templates select node-set-expression
    mode qnamegt
  • apply templates to selected children
    (defaultall)
  • optional mode attribute   
  • ltxslcall-template nameqnamegt

44
XSLT Processing Model
  • XSL stylesheet collection of template rules
  • template rule (pattern ? template)
  • main steps
  • match pattern against source tree
  • instantiate template (replace current node . by
    the template in the result tree)
  • select further nodes for processing
  • control can be a mix of
  • recursive processing ("push" ltxslapply-templates
    gt ...)
  • program-driven ("pull" ltxslforeachgt ...)

45
Template Rule Example
pattern
template
ltxsltemplate match"product"gt lttablegt
ltxslapply-templates select"sales/domestic"/gt
lt/tablegt lttablegt ltxslapply-templates
select"sales/foreign"/gt lt/tablegt
lt/xsltemplategt
(i) match pattern process ltproductgt
elements (ii) instantiate template replace each
product element with two HTML tables (iii) select
the ltproductgt grandchildren (sales/domestic,
sales/foreign) for further processing
46
XSLT Example
47
XSLT Example (contd)
48
XSLT Example (contd)
49
Demonstrations
  • XML Queries and Transformations

50
A Commercial Tool XML Spy
51
XQuery
52
Example
53
XQuery Example
54
An XQuery Implementation Galax
  • http//www.galaxquery.org/

55
Example Relational Data gt XML
R
?R? ?tuple? ?A? a1 ?/A? ?B? b1 ?/B? ?C? c1
?/C? ?/tuple? ?tuple? ?A? a2 ?/A? ?B? b2
?/B? ?C? c2 ?/C? ?/tuple? ?/R?
56
XQuery References
  • XQueryAn XML query language, Don Chamberlin, IBM
    Systems Journal, 41(4), 2002. http//www.research.
    ibm.com/journal/sj/414/chamberlin.pdf
  • Galax XQuery implementation, http//www.galaxquery
    .org/
Write a Comment
User Comments (0)