Title: XML
1XML
2XML
- eXtensible Markup Language
- XML 1.0 a recommendation from W3C, 1998
- Roots SGML (a very nasty language).
- After the roots a format for sharing data
3Why XML is of Interest to Us
- XML is just syntax for data
- Note we have no syntax for relational data
- But XML is not relational semistructured
- This is exciting because
- Can translate any data to XML
- Can ship XML over the Web (HTTP)
- Can input XML into any application
- Thus data sharing and exchange on the Web
4XML Data Sharing and Exchange
application
application
object-relational
Integrate
XML Data
WEB (HTTP)
Transform
Warehouse
application
relational data
legacy data
Specific data management tasks
5From HTML to XML
HTML describes the presentation
6HTML
- Bibliography
- Foundations of Databases
- Abiteboul, Hull, Vianu
-
Addison Wesley, 1995 - Data on the Web
- Abiteoul, Buneman, Suciu
-
Morgan Kaufmann, 1999
7XML
-
- Foundations
- Abiteboul
- Hull
- Vianu
- Addison Wesley
- 1995
-
-
XML describes the content
8Web Services
- A new paradigm for creating distributed
applications? - Systems communicate via messages, contracts.
- Example order processing system.
- MS .NET, J2EE some of the platforms
- XML a part of the story the data format.
9XML Terminology
- tags book, title, author,
- start tag , end tag
- elements ,
- elements are nested
- empty element abbrv.
- an XML document single root element
well formed XML document if it has matching tags
10More XML Attributes
-
- Foundations of Databases
- Abiteboul
-
- 1995
attributes are alternative ways to represent data
11More XML Oids and References
- Jane
- Mary
- idrefo123 o555/
-
- John
oids and references in XML are just syntax
12XML Semantics a Tree !
data
Mary
Maple
345 Seattle
John
Thailand
23456
person
person
id
address
name
address
name
phone
o555
street
no
city
Mary
Thai
John
23456
Maple
345
Seattle
Order matters !!!
13XML Data
- XML is self-describing
- Schema elements become part of the data
- Reational schema persons(name,phone)
- In XML , , are part of the
data, and are repeated many times - Consequence XML is much more flexible
- XML semistructured data
14Relational Data as XML
person
XML
person
row
row
row
phone
name
name
name
phone
phone
John
3634
Sue
Dick
6343
6363
-
- John
- 3634
- Sue
- 6343
- Dick
- 6363
15XML is Semi-structured Data
- Missing attributes
- Could represent ina table with nulls
John
1234
Joe
? no phone !
16XML is Semi-structured Data
- Repeated attributes
- Impossible in tables
Mary
2345
3456
? two phones !
???
17XML is Semi-structured Data
- Attributes with different types in different
objects - Nested collections (no 1NF)
- Heterogeneous collections
- contains both s and s
John
Smith
1234
? structured name !
18Document Type DefinitionsDTD
- part of the original XML specification
- an XML document may have a DTD
- XML document
- well-formed if tags are correctly closed
- Valid if it has a DTD and conforms to it
- validation is useful in data exchange
19Very Simple DTD
((personproduct)) name, office, phone?) (PCDATA)
(PCDATA) description?)
20Very Simple DTD
Example of valid XML document
123456789
John
B432
1234
987654321
Jim
B123
... ...
21DTD The Content Model
-
- Content model
- Complex a regular expression over other
elements - Text-only PCDATA
- Empty EMPTY
- Any ANY
- Mixed content (PCDATA A B C)
contentmodel
22DTD Regular Expressions
DTD
XML
sequence
(firstName, lastName))
. . . . .
. . . . .
optional
. . . . .
. . . . . . . . .
. . . . . . .
. . . . .
Kleene star
alternation
23Querying XML Data
- XPath simple navigation through the tree
- XQuery the SQL of XML
- XSLT recursive traversal
24Sample Data for Queries
- Addison-Wesley
Serge
Abiteboul
Rick
Hull
Victor
Vianu Foundations
of Databases 1995
Freeman
Jeffrey D. Ullman
Principles of Database and Knowledge
Base Systems 1998
25Data Model for XPath
The root
The root element
book
book
publisher
author
. . . .
Addison-Wesley
Serge Abiteboul
26XPath Simple Expressions
- Result 1995
- 1998
- Result empty (there were no papers)
/bib/book/year
/bib/paper/year
27XPath Restricted Kleene Closure
- Result Serge Abiteboul
- Rick
- Hull
-
- Victor Vianu
- Jeffrey D. Ullman
- Result Rick
//author
/bib//first-name
28Xpath Text Nodes
/bib/book/author/text()
- Result Serge Abiteboul
- Jeffrey D. Ullman
- Rick Hull doesnt appear because he has
firstname, lastname - Functions in XPath
- text() matches the text value
- node() matches any node ( or _at_ or text())
- name() returns the name of the current tag
29Xpath Wildcard
- Result Rick
- Hull
- Matches any element
//author/
30Xpath Attribute Nodes
/bib/book/_at_price
- Result 55
- _at_price means that price is has to be an attribute
31Xpath Predicates
/bib/book/authorfirstname
32Xpath More Predicates
/bib/book/authorfirstnameaddress//zipcity/
lastname
33Xpath More Predicates
/bib/book_at_price /bib/bookauthor/_at_age /bib/bookauthor/text()
34Xpath Summary
- bib matches a bib element
- matches any element
- / matches the root element
- /bib matches a bib element under root
- bib/paper matches a paper in bib
- bib//paper matches a paper in bib, at any depth
- //paper matches a paper at any depth
- paperbook matches a paper or a book
- _at_price matches a price attribute
- bib/book/_at_price matches price attribute in book,
in bib - bib/book/_at_price
35Comments on XPath?
- Whats good about it?
- What cant it do that you want it to do?
- How does it compare, say, to SQL?
36XQuery
- Based on Quilt, which is based on XML-QL
- Uses XPath to express more complex queries
37FLWR (Flower) Expressions
- FOR ...
- LET...
- WHERE...
- RETURN...
38XQuery
- Find all book titles published after 1995
FOR x IN document("bib.xml")/bib/book WHERE
x/year 1995 RETURN x/title
Result abc def
ghi
39XQuery
- Find book titles by the coauthors of Database
Theory
FOR x IN bib/booktitle/text() Database
Theory/author y IN bib/bookauthor/tex
t() x/text()/title RETURN
y/text()
Result abc
def ghi
The answer willcontain duplicates !
40XQuery
- Same as before, but eliminate duplicates
FOR x IN bib/booktitle/text() Database
Theory/author y IN distinct(bib/booka
uthor/text() x/text()/title) RETURN
y/text()
Result abc
def ghi
distinct a function that eliminates duplicates
41XQuery Nesting
- For each author of a book by Morgan Kaufmann,
list all books she published
FOR a IN distinct(document("bib.xml")
/bib/bookpublisherMorgan
Kaufmann/author) RETURN
a, FOR t IN
/bib/bookauthora/title
RETURN t
42XQuery
Result
43XQuery
- FOR x in expr -- binds x to each value in the
list expr - LET x expr -- binds x to the entire list
expr - Useful for common subexpressions and for
aggregations
44XQuery
FOR p IN
distinct(document("bib.xml")//publisher)
LET b document("bib.xml")/bookpublisher
p WHERE count(b) 100 RETURN
p
count a (aggregate) function that returns the
number of elms
45XQuery
- Find books whose price is larger than average
LET aavg(document("bib.xml")/bib/book/price) FOR
b in document("bib.xml")/bib/book WHERE
b/price a RETURN b
Lets try to write this in SQL
46XQuery
- Summary
- FOR-LET-WHERE-RETURN FLWR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
RETURN Clause
Instance of Xquery data model
47FOR v.s. LET
- FOR
- Binds node variables ? iteration
- LET
- Binds collection variables ? one value
48FOR v.s. LET
Returns ...
...
... ...
FOR x IN document("bib.xml")/bib/book RETURN
x
LET x IN document("bib.xml")/bib/book RETURN
x
Returns ...
...
... ...
49Collections in XQuery
- Ordered and unordered collections
- /bib/book/author an ordered collection
- Distinct(/bib/book/author) an unordered
collection - LET a /bib/book ? a is a collection
- b/author ? a collection (several authors...)
Returns ...
...
...
...
RETURN b/author