Title: Introduction to Databases: Relational and XML Models and Languages
1Introduction to DatabasesRelational and XML
Models and Languages
- Instructors
- Bertram Ludaescher
- Kai Lin
2Overview (Part 2)
- 0915-1020 Relational Databases (1h05)
- 1020-1030 BREAK (10)
- 1030-1150 Relational Databases (1h20)
- 1150-1315 LUNCH (1h25)
- 1315-1345 Demo Hands-on (30)
- 1345-1510 XML Basics (1h25)
- 1510-1530 BREAK (20)
- 1530-1630 XML Querying (1h)
- 1630-1700 Demo Hands-on (30)
3XML and Related Standards
- An introduction to XML, DTDs, XML Schema, and the
DOM - includes material by
- Shawn Bowers, SDSC
- Michael Gertz, UC Davis
-
4(No Transcript)
5A Neuroscientists Information Integration Problem
Biomedical Informatics Research
Network http//nbirn.net
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
Complex Multiple-Worlds Mediation
6A Home Buyers Information Integration Problem
What houses for sale under 500k have at least 2
bathrooms, 2 bedrooms, a nearby school ranking
in the upper third, in a neighborhood with
below-average crime rate and diverse population?
? Information Integration
Multiple-Worlds Mediation
7An Online Shoppers Information Integration
Problem
El Cheapo Where can I get the cheapest copy
(including shipping cost) of Wittgensteins
Tractatus Logicus-Philosophicus within a week?
One-World Mediation
Mediator (virtual DB) (vs. Datawarehouse)
8Information Integration Challenges
- System aspects Grid Middleware
- distributed data computing
- Web Services, WSDL/SOAP, OGSA,
- sources functions, files, data sets
-
- Syntax Structure
- (XML-Based) Data Mediators
- wrapping, restructuring
- (XML) queries and views
- sources (XML) databases
- Semantics
- Model-Based/Semantic Mediators
- conceptual models and declarative views
- Knowledge Representation ontologies, description
logics (RDF(S),OWL ...) - sources knowledge bases (DBCMsICs)
9Information Integration Challenges S4
Heterogeneities
- Systems Integration
- platforms, devices, data service distribution,
APIs, protocols, - ? Grid middleware technologies
- e.g. single sign-on, platform independence,
transparent use of remote resources, - Syntax Structure
- heterogeneous data formats (one for each tool
...) - heterogeneous data models (RDBs, ORDBs, OODBs,
XMLDBs, flat files, ) - heterogeneous schemas (one for each DB ...)
- ? Database mediation technologies
- XML-based data exchange, integrated views,
transparent query rewriting, - Semantics
- fuzzy metadata, terminology, hidden semantics,
implicit assumptions, - ? Knowledge representation semantic mediation
technologies - smart data discovery integration
- e.g. ask about X (mafic) find data about Y
(diorite) be happy anyways!
10Structural / XML-Based Mediation
11Information Integration from a DB Perspective
- Information Integration Problem
- Given data sources S1, ..., Sk (DBMS, web sites,
...) and user questions Q1,..., Qn that can be
answered using the Si - Find the answers to Q1, ..., Qn
- The Database Perspective source database
- Si has a schema (relational, XML, OO, ...)
- Si can be queried
- define virtual (or materialized)
integrated/global view G over S1 ,..., Sk using
database query languages (SQL, XQuery,...) - questions become queries Qi against G(S1,..., Sk)
12Standard (XML-Based) Mediator Architecture
USER/Client
3. Q1 Q2 Q3
4. answers(Q1)
answers(Q2) answers(Q3)
13Query Planning for Mediators
- Given
- User query Q answer() ? G ...
- G ? S global-as-view (GAV)
- S ? G local-as-view (LAV)
- ic() ? S G integrity constraints
(ICs) - Find
- equivalent (or min. containing, max.contained)
query plan Q answer() ? S - Results
- A variety of results/algorithms depending on
classes of queries, views, and ICs P, NP,,
undecidable - many variants still open
14Background
- Markup
- Annotations (tags) for carrying information about
a documents content - a writers handwritten notes for typesetting
- an editors corrections in a manuscript
- A Markup Language defines a syntax and grammar
for tags
15Background (contd)
- SGML
- Standard Generalized Markup Language
- Standardized in 1986 (ISO)
- A language for defining markup languages
- And for marking-up content
- Syntax Document Type Definition (DTD)
- Tools aimed at document management
16Background (contd)
- HTML
- A markup language
- A particular SGML Document Type (called an
application) - Tools for browsing and authoring
17Background (contd)
- Limitations
- SGML
- Complex, many options and shortcuts
- Must know the DTD to parse correctly
- Cost of SGML technology is high
- HTML
- Not extensiblecant define new tags
- Tags for presenting data not describing it
- Doesnt capture much document structure or
content meaning
18Enter XML
- XML (Extensible Markup Language)
- Standardized by W3C in 1998
- For data interchange over the Web
- A Simpler SGML
- Actually, a subset of SGML
- DTDs are optional
- Less features and options
- Widely available tools for parsing, authoring,
browsing, etc.
19Uses for XML
- Why XML?
- Capture logical structure of documents
- Presentation Independent
- Data Interchange
- XML is implementation independent
- Storage Format
- Maiers Maxim Any successful interchange format
becomes a storage format - Metadata
- Searching, filtering, organizing
- Data Packaging, Movement, and Processing
- Client-Side processing, Server-to-Server
communication, Non-browser based clients,
Simplified Server Processing, etc.
20(No Transcript)
21(Some of) The Many Standards of XML
Schema and Types XML Schema and XML data types
Programming Document Object Model (DOM) - API to
XML documents
Query XQuery, XQL, XML-QL
XML DTD
Transformation XSLT for rearranging and
restructuring XML documents
Addressing XPath and Xpointer for addressing XML
subdocuments
XML Document
Transport XML-RPC, SOAP, XML-Protocol for
message and object serialization and remote
procedure calls
Metadata RDF - using XML to define resource
metadata
Linking XLink for simple and complex hyperlinks
between XML Documents
22The Running Example
- Lego Product Catalogs
- catalogs have
- a publishing date, an identifier, a title, etc.
- catalogs are made up of products
- either a kit or accessory
- each has an item , price, name, picture, etc.
- kits can have
- an age level, of pieces, set type (duplo,
basic), a theme (star wars), a system (space)
23An Example XML Catalog Document
lt?xml version1.0?gt ltLegoCataloggt
ltpubDategt2000lt/pubDategt ltproductsgt
ltkitgt ltnamegtX-Wing Fighterlt/namegt
ltagesgt ltminAgegt7lt/minAgegt
ltmaxAgegt12lt/maxAgegt
lt/agesgt ltpiecesgt263lt/piecesgt
ltthemegtStar Warslt/themegt
ltdescgtTake to the skies with Luke
as he battles the forces of evil! lt/descgt
lt/kitgt lt/productsgt lt/LegoCataloggt
24An Example XML Document
prolog
lt?xml version1.0?gt ltLegoCataloggt
ltpubDategt2000lt/pubDategt ltproductsgt
ltkitgt ltnamegtX-Wing Fighterlt/namegt
ltagesgt ltminAgegt7lt/minAgegt
ltmaxAgegt12lt/maxAgegt
lt/agesgt ltpiecesgt263lt/piecesgt
ltthemegtStar Warslt/themegt
ltdescgtTake to the skies with Luke
as he battles the forces of evil!
lt/descgt lt/kitgt lt/productsgt lt/LegoCata
loggt
elements have start and end-tags
elements are nested boxes within boxes
body
elements can also contain content
25Well Formed Documents
- Well-formed XML documents
- A single root element
- Start and end tags required (unlike HTML)
- ltnamegtX-Wing Fighterlt/namegt
- empty-element tags lttheme/gt
- Elements must be properly nested
- ltkitgtltpiecesgt263lt/kitgtlt/piecesgt
- More rules
- naming elements, document has at least one
element, etc.
This is NOT properly nested!!!
26XML Attributes
- Elements can contain attributes
- ltkit unitId7140 price29.99
shipWeight1lbgt
element name
attribute name
attribute name
attribute name
attribute value
attribute value
attribute value
Attributes are always assigned in element start
tags, are always surrounded by double quotes, and
must be unique in the element
27Attributes vs. Content
- In general, it is up to the document designer
- In SGML, content usually was for data you see and
attributes for metadata - how I do it
- Attribute atomic content, applying to the
whole element - Content (Subelement) otherwise
28Document Type Definition
- Why DTDs?
- To standardize tags and structure for interchange
and creation - To make the documents machine processable
- What is a DTD?
- A grammar for describing XML documents (tags,
attributes, nesting, etc.) - An XML document that is well-formed and conforms
to a DTD is said to be valid
29An Example DTD Elements
An element content model for LegoCatalog
lt!ELEMENT LegoCatalog (pubDate,
products)gt lt!ELEMENT pubDate (PCDATA)gt lt!ELEMEN
T products (kit accessory)gt lt!ELEMENT kit
(name, ages, pieces, theme?,
series?, desc)gt lt!ELEMENT ages (minAge,
maxAge)gt lt!ELEMENT minAge (PCDATA)gt lt!ELEMENT
maxAge (PCDATA)gt lt!ELEMENT pieces
(PCDATA)gt lt!ELEMENT series (PCDATA)gt lt!ELEMENT
desc (PCDATA)gt
A character data content model for pubDate
zero or more one or more ? optional
Choice , Strict Sequence () Grouping
Empty, Any, and Mixed content models
30An Example DTD Attributes
lt!ATTLIST kit price CDATA REQUIRED
shipWeight CDATA REQUIRED avail (yes
no) IMPLIED image CDATA na.jpg
unitId ID IMPLIED gt lt!ATTLIST accessory
forKits IDREFS IMPLIED orderStatus CDATA
FIXED special gt
each attribute has the form attr-name type
default-decl
CDATA character data ID unique
identifier IDREF reference to an ID IDREFS
list of references enumeration list of possible
values
REQUIRED must appear IMPLIED optionally
appear FIXED default if attribute is
missing, parser assumes value Default only
if attribute is missing, default is
assumed, otherwise any value
31Limitations of DTDs
- DTDs are not optimal
- Not well-formed XML
- cant parse them with an XML parser
- need different tools to create them
- but at least you can sort-of read/understand
them (try XML Schema -) - Limited support for defining data types
- Limited modeling capabilities
- hard to express some structures
- no support for reusing structure
32Enter XML Schema
- XML Schema
- W3C proposed recommendation (2001)
- Divided into 2 parts structures, datatypes
- Main features
- Well-formed XML documents
- A schema can span multiple documents
- Can define new data types and constraints
- Inheritance among content model types
- Improves data interchange
- Offers more precision for computer-computer
transfer
33Example XML Schema
ltschemagt ltelement nameproductsgt
ltcomplexTypegt ltsequencegt ltelement
namekit typeProduct minOccurs1
maxOccursunbounded/gt ltelement
nameaccessory typeProduct minOccurs0
maxOccursunbounded/gt
... lt/elementgt ltcomplexType
nameProductgt ltattribute nameprice
typeDollarType/gt lt/complexTypegt
ltsimpleType nameDollarTypegt ltpattern
valuereg-exp/gt ...
ComplexType Content Model
Many ways to describe new data types (not just
regular expressions)
34XML Schema User-Defined Type/Class Hierarchy
Time to Leave the Trees From Syntactic to
Conceptual Querying of XML, B. Ludäscher, I.
Altintas, A. Gupta, Intl. Workshop on XML Data
Management (XMLDM), Prague, Czech Republic, March
2002, LNCS 2490, Springer
35XML Schema Declarations (home-style syntax)
Complex Type Declarations
36XML Schema (home-style)
Simple Type Declarations
Complex Types
37Programming with XML
- The DOM (document object model)
- Maintained by the W3C
- Language and platform independent
- An object model for XML (actually, an API)
- core, views, events, style, persistence, etc.
output
XML
creates manipulates
Application
Parser
generates
accesses
DOM objects
38DOM Example
ltLegoCataloggt ltkit price29.99gt
Take to the skies ... lt/kitgt lt/LegoCataloggt
Document Node
d.load()
Document Root
NodeList
Element Node
ln d.documentElement
ltLegoCataloggt
NOTE I left off the desc element and just placed
its content under kit.
NodeList
lnl ln.childNodes
Element Node
ltkitgt
kn lnl.item(0)
Named Node Map
knm kn.attributes
Attr Node
ka knm.item(0)
pieces263
NodeList
knl kn.childNodes
Text Char. Data Node
knl knl.item(0)
Take to the skies ...
39XML Query Languages
- XPath
- /order//books/bookcover_stylepaperbackpr
icelt80 - XQuery
- the W3C XML query language
- XSLT
- XML transformations (XMLgtHTML, XMLgtXML)
- ...
40XPath
41Example
42XSLT Processing Model
43XSLT Elements
- ltxslstylesheet version"1.0" xmlnsxsl"http//ww
w.w3.org/1999/XSL/Transform"gt - root element of an XSLT stylesheet "program"
- ltxsltemplate matchpattern nameqname
prioritynumber modeqnamegt - ...template...
- lt/xsltemplategt
- declares a rule (pattern gt template)
- ltxslapply-templates select node-set-expression
mode qnamegt - apply templates to selected children
(defaultall) - optional mode attribute  Â
- ltxslcall-template nameqnamegt
44XSLT Processing Model
- XSL stylesheet collection of template rules
- template rule (pattern ? template)
- main steps
- match pattern against source tree
- instantiate template (replace current node . by
the template in the result tree) - select further nodes for processing
- control can be a mix of
- recursive processing ("push" ltxslapply-templates
gt ...) - program-driven ("pull" ltxslforeachgt ...)
45Template Rule Example
pattern
template
ltxsltemplate match"product"gt lttablegt
ltxslapply-templates select"sales/domestic"/gt
lt/tablegt lttablegt ltxslapply-templates
select"sales/foreign"/gt lt/tablegt
lt/xsltemplategt
(i) match pattern process ltproductgt
elements (ii) instantiate template replace each
product element with two HTML tables (iii) select
the ltproductgt grandchildren (sales/domestic,
sales/foreign) for further processing
46XSLT Example
47XSLT Example (contd)
48XSLT Example (contd)
49Demonstrations
- XML Queries and Transformations
50A Commercial Tool XML Spy
51XQuery
52Example
53XQuery Example
54An XQuery Implementation Galax
- http//www.galaxquery.org/
55Example Relational Data gt XML
R
?R? ?tuple? ?A? a1 ?/A? ?B? b1 ?/B? ?C? c1
?/C? ?/tuple? ?tuple? ?A? a2 ?/A? ?B? b2
?/B? ?C? c2 ?/C? ?/tuple? ?/R?
56XQuery References
- XQueryAn XML query language, Don Chamberlin, IBM
Systems Journal, 41(4), 2002. http//www.research.
ibm.com/journal/sj/414/chamberlin.pdf - Galax XQuery implementation, http//www.galaxquery
.org/