Title: Agenda from now on
1Agenda from now on
- Done SQL, views, transactions, conceptual
modeling, E/R, relational algebra. - Starting XML
- To do the database engine
- Storage
- Query execution
- Query optimization
2XML
3XML
- eXtensible Markup Language
- XML 1.0 a recommendation from W3C, 1998
- Roots SGML (a very nasty language).
- After the roots a format for sharing data
4Why XML is of Interest to Us
- XML is just syntax for data
- Note we have no syntax for relational data
- But XML is not relational semistructured
- This is exciting because
- Can translate any data to XML
- Can ship XML over the Web (HTTP)
- Can input XML into any application
- Thus data sharing and exchange on the Web
5XML Data Sharing and Exchange
application
application
object-relational
Integrate
XML Data
WEB (HTTP)
Transform
Warehouse
application
relational data
legacy data
Specific data management tasks
6From HTML to XML
HTML describes the presentation
7HTML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteoul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
8XML
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
XML describes the content
9Web Services
- A new paradigm for creating distributed
applications? - Systems communicate via messages, contracts.
- Example order processing system.
- MS .NET, J2EE some of the platforms
- XML a part of the story the data format.
10XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
11More XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
attributes are alternative ways to represent data
12More XML Oids and References
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
13XML Semantics a Tree !
data
ltdatagt ltperson ido555 gt ltnamegt Mary
lt/namegt ltaddressgt ltstreetgt Maple lt/streetgt ltnogt
345 lt/nogt ltcitygt Seattle lt/citygt
lt/addressgt lt/persongt ltpersongt ltnamegt John
lt/namegt ltaddressgt Thailand lt/addressgt ltphonegt
23456 lt/phonegt lt/persongt lt/datagt
person
person
id
address
name
address
name
phone
o555
street
no
city
Mary
Thai
John
23456
Maple
345
Seattle
Order matters !!!
14XML Data
- XML is self-describing
- Schema elements become part of the data
- Reational schema persons(name,phone)
- In XML ltpersonsgt, ltnamegt, ltphonegt are part of the
data, and are repeated many times - Consequence XML is much more flexible
- XML semistructured data
15Relational Data as XML
person
XML
person
row
row
row
phone
name
name
name
phone
phone
John
3634
Sue
Dick
6343
6363
- ltpersongt
- ltrowgt ltnamegtJohnlt/namegt
- ltphonegt 3634lt/phonegtlt/rowgt
- ltrowgt ltnamegtSuelt/namegt
- ltphonegt 6343lt/phonegt
- ltrowgt ltnamegtDicklt/namegt
- ltphonegt 6363lt/phonegtlt/rowgt
- lt/persongt
16XML is Semi-structured Data
- Missing attributes
- Could represent ina table with nulls
ltpersongt ltnamegt Johnlt/namegt
ltphonegt1234lt/phonegt lt/persongt ltpersongt
ltnamegtJoelt/namegt lt/persongt
? no phone !
17XML is Semi-structured Data
- Repeated attributes
- Impossible in tables
ltpersongt ltnamegt Marylt/namegt
ltphonegt2345lt/phonegt
ltphonegt3456lt/phonegt lt/persongt
? two phones !
???
18XML is Semi-structured Data
- Attributes with different types in different
objects - Nested collections (no 1NF)
- Heterogeneous collections
- ltdbgt contains both ltbookgts and ltpublishergts
ltpersongt ltnamegt ltfirstgt John lt/firstgt
ltlastgt Smith lt/lastgt
lt/namegt
ltphonegt1234lt/phonegt lt/persongt
? structured name !
19Document Type DefinitionsDTD
- part of the original XML specification
- an XML document may have a DTD
- XML document
- well-formed if tags are correctly closed
- Valid if it has a DTD and conforms to it
- validation is useful in data exchange
20Very Simple DTD
lt!DOCTYPE company lt!ELEMENT company
((personproduct))gt lt!ELEMENT person (ssn,
name, office, phone?)gt lt!ELEMENT ssn
(PCDATA)gt lt!ELEMENT name (PCDATA)gt
lt!ELEMENT office (PCDATA)gt lt!ELEMENT phone
(PCDATA)gt lt!ELEMENT product (pid, name,
description?)gt lt!ELEMENT pid (PCDATA)gt
lt!ELEMENT description (PCDATA)gt gt
21Very Simple DTD
Example of valid XML document
ltcompanygt ltpersongt ltssngt 123456789 lt/ssngt
ltnamegt John lt/namegt
ltofficegt B432 lt/officegt
ltphonegt 1234 lt/phonegt lt/persongt
ltpersongt ltssngt 987654321 lt/ssngt
ltnamegt Jim lt/namegt
ltofficegt B123 lt/officegt lt/persongt
ltproductgt ... lt/productgt ... lt/companygt
22DTD The Content Model
lt!ELEMENT tag (CONTENT)gt
-
- Content model
- Complex a regular expression over other
elements - Text-only PCDATA
- Empty EMPTY
- Any ANY
- Mixed content (PCDATA A B C)
contentmodel
23DTD Regular Expressions
DTD
XML
sequence
lt!ELEMENT name
(firstName, lastName))
ltnamegt ltfirstNamegt . . . . . lt/firstNamegt
ltlastNamegt . . . . . lt/lastNamegt lt/namegt
optional
lt!ELEMENT name (firstName?, lastName))
ltpersongt ltnamegt . . . . . lt/namegt
ltphonegt . . . . . lt/phonegt ltphonegt . . . .
. lt/phonegt ltphonegt . . . . . lt/phonegt .
. . . . . lt/persongt
Kleene star
lt!ELEMENT person (name, phone))
alternation
lt!ELEMENT person (name, (phoneemail)))
24Querying XML Data
- XPath simple navigation through the tree
- XQuery the SQL of XML
- XSLT recursive traversal
- will not discuss in class
25Sample Data for Queries
- ltbibgtltbookgt ltpublishergt Addison-Wesley
lt/publishergt ltauthorgt Serge
Abiteboul lt/authorgt ltauthorgt
ltfirst-namegt Rick lt/first-namegt
ltlast-namegt Hull lt/last-namegt
lt/authorgt ltauthorgt Victor
Vianu lt/authorgt lttitlegt Foundations
of Databases lt/titlegt ltyeargt 1995
lt/yeargtlt/bookgtltbook price55gt
ltpublishergt Freeman lt/publishergt
ltauthorgt Jeffrey D. Ullman lt/authorgt
lttitlegt Principles of Database and Knowledge
Base Systems lt/titlegt ltyeargt 1998
lt/yeargtlt/bookgt - lt/bibgt
26Data Model for XPath
The root
The root element
book
book
publisher
author
. . . .
Addison-Wesley
Serge Abiteboul
27XPath Simple Expressions
/bib/book/year
- Result ltyeargt 1995 lt/yeargt
- ltyeargt 1998 lt/yeargt
- Result empty (there were no papers)
/bib/paper/year
28XPath Restricted Kleene Closure
//author
- Resultltauthorgt Serge Abiteboul lt/authorgt
- ltauthorgt ltfirst-namegt Rick
lt/first-namegt - ltlast-namegt Hull
lt/last-namegt - lt/authorgt
- ltauthorgt Victor Vianu lt/authorgt
- ltauthorgt Jeffrey D. Ullman
lt/authorgt - Result ltfirst-namegt Rick lt/first-namegt
/bib//first-name
29Xpath Text Nodes
/bib/book/author/text()
- Result Serge Abiteboul
- Jeffrey D. Ullman
- Rick Hull doesnt appear because he has
firstname, lastname - Functions in XPath
- text() matches the text value
- node() matches any node ( or _at_ or text())
- name() returns the name of the current tag
30Xpath Wildcard
- Result ltfirst-namegt Rick lt/first-namegt
- ltlast-namegt Hull lt/last-namegt
- Matches any element
//author/
31Xpath Attribute Nodes
/bib/book/_at_price
- Result 55
- _at_price means that price is has to be an attribute
32Xpath Predicates
/bib/book/authorfirstname
- Result ltauthorgt ltfirst-namegt Rick lt/first-namegt
- ltlast-namegt Hull
lt/last-namegt - lt/authorgt
33Xpath More Predicates
- Result ltlastnamegt lt/lastnamegt
- ltlastnamegt lt/lastnamegt
-
/bib/book/authorfirstnameaddress//zipcity/
lastname
34Xpath More Predicates
/bib/book_at_price lt 60
/bib/bookauthor/_at_age lt 25
/bib/bookauthor/text()
35Xpath Summary
- bib matches a bib element
- matches any element
- / matches the root element
- /bib matches a bib element under root
- bib/paper matches a paper in bib
- bib//paper matches a paper in bib, at any depth
- //paper matches a paper at any depth
- paperbook matches a paper or a book
- _at_price matches a price attribute
- bib/book/_at_price matches price attribute in book,
in bib - bib/book/_at_pricelt55/author/lastname matches