Title: about XML/Xquery/RDF
14/1
about XML/Xquery/RDF
2(No Transcript)
3HTML vs. XML
- lth1gt Bibliography lt/h1gt
- ltpgt ltigt Foundations of Databases lt/igt
- Abiteboul, Hull, Vianu
- ltbrgt Addison Wesley, 1995
- ltpgt ltigt Data on the Web lt/igt
- Abiteoul, Buneman, Suciu
- ltbrgt Morgan Kaufmann, 1999
- ltbibliographygt
- ltbookgt lttitlegt Foundations lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
- ltauthorgt Hull lt/authorgt
- ltauthorgt Vianu lt/authorgt
- ltpublishergt Addison Wesley
lt/publishergt - ltyeargt 1995 lt/yeargt
- lt/bookgt
-
- lt/bibliographygt
Self-describing -Schema info part of the
data -Good for data exchange (albeit
baroque for storage)
4lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
5Why are Database folks so excited about XML?
- XML is just a syntax for (self-describing) data
- This is still exciting because
- No standard syntax for relational data
- With XML, we can
- Translate any legacy data to XML
- Can exchange data in XML format
- Ship over the web, input to any application
6XML ? machine accessible meaning
Jim Hendler
This is what a web-page in natural language
looks like for a machine
7XML ? machine accessible meaning
Jim Hendler
XML allows meaningful tags to be added toparts
of the text
8XML ? machine accessible meaning
Jim Hendler
But to your machine, the tags look like this.
9XML ? machine accessible meaning
Jim Hendler
Schemas help.
lt CV gt
by relating common termsbetween documents
private
10But other people use other schemas
Jim Hendler
Someone else has one like this.
11But other people use other schemas
Jim Hendler
lt CV gt
which dont fit in
private
Moral There is still need for
ontology mapping..
12The X-standards
- XML an on-the-wire representation for data
- Xquery a query language for XML
- Xschema a schema description language for XML
data - RDF a language for meta-data description
- WSDL/SOAP/UDDI languages for describing services
13XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
14lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
15(No Transcript)
16XML Terminology
- tags book, title, author,
- start tag ltbookgt, end tag lt/bookgt
- elements ltbookgtltbookgt,ltauthorgtlt/authorgt
- elements are nested
- empty element ltredgtlt/redgt abbrv. ltred/gt
- an XML document single root element
well formed XML document if it has matching tags
17More XML Attributes
- ltbook price 55 currency USDgt
- lttitlegt Foundations of Databases lt/titlegt
- ltauthorgt Abiteboul lt/authorgt
-
- ltyeargt 1995 lt/yeargt
- lt/bookgt
Attributes are single-valued --No
guidance on when to use them
18More XML Oids and References
Object identifiers
- ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
- ltperson ido456gt ltnamegt Mary lt/namegt
- ltchildren
idrefo123 o555/gt - lt/persongt
- ltperson ido123 mothero456gtltnamegtJohnlt/namegt
- lt/persongt
oids and references in XML are just syntax
19XML vs. Relational Data
- XML is meant as a language that supports both
Text and Structured Data - Conflicting demands...
- XML supports semi-structured data
- In essence, the schema can be union of multiple
schemas - Easy to represent books with or without prices,
books with any number of authors etc. - XML supports free mixing of text and data
- using the PCDATA type
- XML is ordered (while relational data is
unordered)
20DTDs
Notice that DTD is not In XML syntax ?
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
Semi- structured
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
21XML Schemas
- More recent proposal (with XML syntax)
- unifies previous schema proposals
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
22XML Schema
23RDF Meta-data Standard for Web
- ltrdfDescription aboutwww.mypage.comgt
- ltaboutgt birds, butterflies, snakes
lt/aboutgt - ltauthorgt ltrdfDescriptiongt
- ltfirstnamegt John
lt/firstnamegt - ltlastnamegt Smith
lt/lastnamegt - lt/rdfDescriptiongt
- lt/authorgt
- lt/rdfDescriptiongt
Goodol semantic networks..?
24Querying XML
- Requirements
- Need to handle lack of schema.
- We may not know much about the data, so we need
to navigate the XML. - Need to support both information retrieval and
SQL-style queries. - Ordered vs. un-ordered XML
- Human readable
- like SQL? ?
- Candidates
- Many based on conflicting requirements
- XSL Makes IR folks happy
- XML-QL Makes DB folks happy
- Xquery W3Cs attempt to make everybody (un)happy
25Xquery Resources
- XQuery 1.0 An XML Query Language
- W3C Working Draft 20 December 2001
- XML Query Use Cases
- W3C Working Draft 20 December 2001
- Microsoft .Net Xquery Language Demo
- http//131.107.228.20/
- http//support.x-hive.com/xquery/index.html
- Supports querying on the documents described in
the W3C Use Cases - Xquery Tutorial by Fankhauser Wadler
- www.research.avayalabs.com/user/wadler/papers/xque
ry-tutorial/ xquery-tutorial.pdf
26FLoWeR Expressions
- Xquery queries are made up of FLWR expressions
that work on paths - For binds variables to nodes
- Let computes aggregates
- Where applies a formula to find matching elements
- Return constructs the output elements
- Path expressions are of the form
- element//element/elementattribvalue
27Comparison to SQL
- Look at the use case description on Xquery manual
- Supports all (?) SQL style queries (with
different syntax of course) default queries in
the demo - Has support for
- constructionoutputting the answers in
arbitrary XML formats (use case XMP ) - path expressions --- navigating the XML tree
(use case seq) - Simple text queries use case text
- Allows queries on Tag elements
- Removes the data/meta-data barrier in queries
- For each book that has at least one author, list
the title and first two authors, and an empty
"et-al" element if the book has additional
authors. XMP use case 6
28DTD for http//www.bn.com/bib.xml
- lt!ELEMENT bib (book )gt
- lt!ELEMENT book (title, (author editor ),
publisher, price )gt - lt!ATTLIST book year CDATA REQUIRED gt
- lt!ELEMENT author (last, first )gt
- lt!ELEMENT editor (last, first, affiliation )gt
- lt!ELEMENT title (PCDATA )gt
- lt!ELEMENT last (PCDATA )gt
- lt!ELEMENT first (PCDATA )gt
- lt!ELEMENT affiliation (PCDATA )gt
- lt!ELEMENT publisher (PCDATA )gt
- lt!ELEMENT price (PCDATA )gt
29Example Query
Query
Result
- ltbibgt
- for b in /bib/book
- where b/publisher "Addison-Wesley"
- and b/_at_year gt 1991
- return ltbook year b/_at_year gt
- b/title
- lt/bookgt
- lt/bibgt
- For all books after 1991,
- return with Year changed from
- a tag to an attribute
ltbibgt ltbook year"1994"gt lttitlegtTCP/IP
Illustratedlt/titlegt lt/bookgt ltbook
year"1992"gt lttitlegtAdvanced Programming in
the Unix environmentlt/titlegt lt/bookgt lt/bibgt
30Example Query (2)
- Return the books that cost more at amazon than
fatbrain - Let amazon document(http//www.amazon.com/book
s.xml), - Let fatbrain document(http//www.fatbrain.com/
books.xml) - For am in amazon/books/book,
- fat in fatbrain/books/book
- Where am/isbn fat/isbn
- and am/price gt fat/price
- Return ltbookgt am/title, am/price, fat/price
ltbookgt
Join
31XML frenzy in the DB Community
- Now that XML is there, what can we do with it?
- Convert all databases from Relational to XML?
- Or provide XML views of relational databases?
- Develop theory of native XML databases?
- Or assume that XML data will be stored in
relational databases.. - Issues What sort of storage mechanisms? What
sort of indices?
32XML middleware for Databases
- XML adapters (middle-ware) received significant
attention in DB community - SilkRoute (ATT)
- Xperanto (IBM)
- Issues
- Need to convert relational data into XML
- Tagging (easy)
- Need to convert Xquery queries into equivalent
SQL queries - Trickier as Xquery supports schema querying
33Dont look beyond this..
34Xquery Tutorial
- Craig Knoblock
- University of Southern California
35References
- XQuery 1.0 An XML Query Language
- W3C Working Draft 20 December 2001
- XML Query Use Cases
- W3C Working Draft 20 December 2001
- Microsoft .Net Xquery Language Demo
- http//131.107.228.20/
- Supports querying on the documents described in
the W3C Use Cases - Xquery Tutorial by Fankhauser Wadler
- www.research.avayalabs.com/user/wadler/papers/xque
ry-tutorial/ xquery-tutorial.pdf
36DTD for http//www.bn.com/bib.xml
- lt!ELEMENT bib (book )gt
- lt!ELEMENT book (title, (author editor ),
publisher, price )gt - lt!ATTLIST book year CDATA REQUIRED gt
- lt!ELEMENT author (last, first )gt
- lt!ELEMENT editor (last, first, affiliation )gt
- lt!ELEMENT title (PCDATA )gt
- lt!ELEMENT last (PCDATA )gt
- lt!ELEMENT first (PCDATA )gt
- lt!ELEMENT affiliation (PCDATA )gt
- lt!ELEMENT publisher (PCDATA )gt
- lt!ELEMENT price (PCDATA )gt
37Data for www.bn.com/bib.xml
- ltbibgt
- ltbook year"1994"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt - ltpublishergtAddison-Wesleylt/publishergt
- ltpricegt 65.95lt/pricegt
- lt/bookgt
- ltbook year"1992"gt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt ltauthorgtltlastgtStevenslt/lastgtltf
irstgtW.lt/firstgtlt/authorgt - ltpublishergtAddison-Wesleylt/publishergt
- ltpricegt65.95lt/pricegt
- lt/bookgt
38Data for www.bn.com/bib.xml (cont.)
- ltbook year"2000"gt
- lttitlegtData on the Weblt/titlegt
- ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
lt/authorgt - ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
authorgt - ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt - ltpublishergtMorgan Kaufmann Publisherslt/publishergt
- ltpricegt 39.95lt/pricegt
- lt/bookgt
- ltbook year"1999"gt
- lttitlegtThe Economics of Technology and Content
for Digital TVlt/titlegt - lteditorgt ltlastgtGerbarglt/lastgtltfirstgtDarcylt/firstgt
ltaffiliationgtCITIlt/affiliationgt lt/editorgt - ltpublishergtKluwer Academic Publisherslt/publishergt
- ltpricegt129.95lt/pricegt
- lt/bookgt
- lt/bibgt
39Document References
- Document can either be referenced explicitly or
in the default namespace - In the Microsoft Demo
- /Bib document("http//www.bn.com/bib.xml")/bib
- We will use /bib throughout, but you must use the
expansion to run the demo - In Theseus the document for xquery is passed as
input
40Projection
- Return the names of all authors of books
- /bib/book/author
-
- ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt - ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt - ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
lt/authorgt - ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
authorgt - ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt
41Project (cont.)
- The same query can also be written as a for loop
- /bib/book/author
-
- for bk in /bib/book return
- for aut in bk/author return aut
-
- ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt - ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
horgt - ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
lt/authorgt - ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
authorgt - ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt
42Selection
- Return the titles of all books published before
1997 - /bib/book_at_year lt "1997"/title
-
- lttitlegtTCP/IP Illustratedlt/titlegt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt
43Selection (cont.)
- Return the titles of all books published before
1997 - /bib/book_at_year lt "1997"/title
-
- for bk in /bib/book
- where bk/_at_year lt "1997"
- return bk/title
-
- lttitlegtTCP/IP Illustratedlt/titlegt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt
44Selection (cont.)
- Return book with the title Data on the Web
- /bib/booktitle "Data on the Web"
-
- ltbook year"2000"gt
- lttitlegtData on the Weblt/titlegt
- ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
lt/authorgt - ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
authorgt - ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt - ltpublishergtMorgan Kaufmann Publisherslt/publishergt
- ltpricegt 39.95lt/pricegt
- lt/bookgt
45Selection (cont.)
- Return the price of the book Data on the Web
- /bib/booktitle "Data on the Web"/price
-
- ltpricegt 39.95lt/pricegt
- How would you return the book with a price of
39.95?
46Selection (cont.)
- Return the book with a price of 39.95
- for bk in /bib/book
- where bk/price " 39.95"
- return bk
-
- ltbook year"2000"gt
- lttitlegtData on the Weblt/titlegt
- ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
lt/authorgt - ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
authorgt - ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
orgt - ltpublishergtMorgan Kaufmann Publisherslt/publishergt
- ltpricegt 39.95lt/pricegt
- lt/bookgt
47Construction
- Return year and title of all books published
before 1997 - for bk in /bib/book
- where bk/_at_year lt "1997"
- return ltbookgt bk/_at_year, bk/title lt/bookgt
-
- ltbook year"1994"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- lt/bookgt
- ltbook year"1992"gt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - lt/bookgt
48Grouping
- Return titles for each author
- for author in distinct(/bib/book/author/last)
return - ltauthor name author/text() gt
- /bib/bookauthor/last author/title
- lt/authorgt
-
- ltauthor name"Stevens"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - lt/authorgt
- ltauthor name"Abiteboul"gt
- lttitlegtData on the Weblt/titlegt
- lt/authorgt
49Join
- Return the books that cost more at amazon than
fatbrain - Let amazon document(http//www.amazon.com/book
s.xml), - Let fatbrain document(http//www.fatbrain.com/
books.xml) - For am in amazon/books/book,
- fat in fatbrain/books/book
- Where am/isbn fat/isbn
- and am/price gt fat/price
- Return ltbookgt am/title, am/price, fat/price
ltbookgt
50Example Query 1
- ltbibgt
- for b in /bib/book
- where b/publisher "Addison-Wesley" and
b/_at_year gt 1991 - return ltbook year b/_at_year gt
- b/title
- lt/bookgt
- lt/bibgt
- What does this do?
51Result Query 1
- ltbibgt
- ltbook year"1994"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- lt/bookgt
- ltbook year"1992"gt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - lt/bookgt
- lt/bibgt
52Example Query 2
- ltresultsgt
- for b in document("http//www.bn.com/bib.xml")/
bib/book, - t in b/title,
- a in b/author
- return
- ltresultgt
- t
- a
- lt/resultgt
- lt/resultsgt
53Result Query 2
- ltresultsgt
- ltresultgtlttitlegtTCP/IP Illustratedlt/titlegt
- ltlastgtStevens lt/lastgt
- lt/resultgt
- ltresultgtlttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - ltlastgtStevenslt/lastgt
- lt/resultgt
- ltresultgtlttitlegtData on the Weblt/titlegt
- ltlastgtAbiteboullt/lastgt
- lt/resultgt
- ltresultgt lttitlegtData on the Weblt/titlegt
- ltlastgtBunemanlt/lastgt
- lt/resultgt
- ltresultgtlttitlegtData on the Weblt/titlegt
- ltlastgtSuciult/lastgt
- lt/resultgt
- lt/resultsgt
54Example Query 3
- ltbooks-with-pricesgt
-
- for b in document("http//www.bn.com/bib.xml"
)//book, - a in document("http//www.amazon.com/revi
ews.xml")//entry - where b/title a/title
- return
- ltbook-with-pricesgt
- b/title
- ltprice-amazongt a/price/text()
lt/price-amazongt - ltprice-bngt b/price/text()
lt/price-bngt - lt/book-with-pricesgt
-
- lt/books-with-pricesgt
55Result Query 3
- ltbooks-with-pricesgt
- ltbook-with-pricesgt
- lttitlegtTCP/IP Illustratedlt/titlegt
- ltprice-amazongt65.95lt/price-amazongt
- ltprice-bngt 65.95lt/price-bngt
- lt/book-with-pricesgt
- ltbook-with-pricesgt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - ltprice-amazongt65.95lt/price-amazongt
- ltprice-bngt65.95lt/price-bngt
- lt/book-with-pricesgt
- ltbook-with-pricesgt
- lttitlegtData on the Web lt/titlegt
- ltprice-amazongt34.95lt/price-amazongt
- ltprice-bngt 39.95lt/price-bngt
- lt/book-with-pricesgt
- lt/books-with-pricesgt
56Example Query 4
- ltbibgt
- for b in document("www.bn.com/bib.xml")//book
- where b/publisher "Addison-Wesley" and
b/_at_year gt "1991" - return ltbookgt b/_at_year b/title
lt/bookgt - sortby (title)
- lt/bibgt
57Example Result 4
- ltbibgt
- ltbook year"1992"gt
- lttitlegtAdvanced Programming in the Unix
environmentlt/titlegt - lt/bookgt
- ltbook year"1994"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- lt/bookgt
- lt/bibgt
58Impact of XML on Integration
- If and when all sources accept Xqueries and
exchange data in XML format, then - Mediator can accept user queries in Xquery
- Access sources using Xquery
- Get data back in XML format
- Merge results and send to user in XML format
- How about now?
- Sources can use XML adapters (middle-ware)
59Is XML standardization a magical solution for
Integration?
- If all WEB sources standardize into XML format
- Source access (wrapper generation issues) become
easier to manage - BUT all other problems remain
- Still need to relate source (XML)schemas to
mediator (XML)schema - Still need to reason about source overlap, source
access limitations etc. - Still need to manage execution in the presence of
source/network uncertainities
60Semantic Web
- The LAV/GAV approaches assume that some human
expert will do the actual schema mapping - The semantic-web initiative attempts to
automate schema mapping - Idea Allow pages to write logical axioms
relating their vocabulary (tags) to other
external tags - Support automatic inference of relations between
source and mediator schema using these rules - DAMLOIL
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68Data Model
69Which will have XML Syntax
70Document Type Definition DTD
- part of the original XML specification
- an XML document may have a DTD
- terminology for XML
- well-formed if tags are correctly closed
- valid if it has a DTD and conforms to it
- validation is useful in data exchange
71Notice that DTD is not In XML syntax ?
72Two ways to specify a DTD
lt?xml version"1.0"?gt lt!DOCTYPE greeting SYSTEM
"hello.dtd"gt ltgreetinggtHello, world!lt/greetinggt
lt?xml version"1.0" encoding"UTF-8" ?gt lt!DOCTYPE
greeting lt!ELEMENT greeting
(PCDATA)gt gt ltgreetinggtHello,
world!lt/greetinggt
73(No Transcript)
74DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
75(No Transcript)
76(No Transcript)
77Shortcomings of DTDs
- Useful for documents, but not so good for data
- No support for structural re-use
- Object-oriented-like structures arent supported
- No support for data types
- Cant do data validation
- Can have a single key item (ID), but
- No support for multi-attribute keys
- No support for foreign keys (references to other
keys) - No constraints on IDREFs (reference only a
Section)
78XML Schema
- In XML format
- Includes primitive data types (integers, strings,
dates, etc.) - Supports value-based constraints (integers gt 100)
- User-definable structured types
- Inheritance (extension or restriction)
- Foreign keys
- Element-type reference constraints
79XML Schemas
Pre-specified tags
- ltelementType namepapergt
- ltsequencegt
- ltelementTypeRef nametitle/gt
- ltelementTypeRef nameauthor
minOccurs0/gt - ltelementTypeRef nameyear/gt
- ltchoicegt ltelementTypeRef
namejournal/gt - ltelementTypeRef
nameconference/gt - lt/choicegt
- lt/sequencegt
- lt/elementTypegt
How many different RDBMS Schemas are needed here?
DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
80Sample XML Schema
- ltschema version1.0 xmlnshttp//www.w3.org/199
9/XMLSchemagt - ltelement nameauthor typestring /gt
- ltelement namedate type date /gt
- ltelement nameabstractgt
- lttypegt
-
- lt/typegt
- lt/elementgt
- ltelement namepapergt
- lttypegt
- ltattribute namekeywords typestring/gt
- ltelement refauthor minOccurs0
maxOccurs /gt - ltelement refdate /gt
- ltelement refabstract minOccurs0
maxOccurs1 /gt - ltelement refbody /gt
- lt/typegt
- lt/elementgt
- lt/schemagt
81Subtyping in XML Schema
- ltschema version1.0 xmlnshttp//www.w3.org/199
9/XMLSchemagt - lttype namepersongt
- ltattribute namessngt
- ltelement nametitle minOccurs0
maxOccurs1 /gt - ltelement namesurname /gt
- ltelement nameforename minOccurs0
maxOccurs /gt - lt/typegt
- lttype nameextended sourceperson
derivedByextensiongt - ltelement namegeneration minOccurs0 /gt
- lt/typegt
- lttype namenotitle sourceperson
derivedByrestrictiongt - ltelement nametitle maxOccurs0 /gt
- lt/typegt
- ltkey namepersonKeygt
- ltselectorgt.//person_at_ssnlt/selectorgt
- ltfieldgt_at_ssnlt/fieldgt
- lt/keygt
- lt/schemagt
82DTDs as Schemas
- Not so well suited
- impose unwanted constraints on order
lt!ELEMENT person (name,phone)gt - references cannot be constrained
- can be too vague
- lt!ELEMENT person ((namephoneemail))gt
- Union of schemas..?
83XML Schemas
- recent proposal
- unifies previous schema proposals
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
84Although DB folks have several beefs
Give me the names of people who are Listed either
as editor or author of a book
85(No Transcript)
86Differences between XML and SSD
- Pure SSD uses edge-labeled graphs as data model
- XML is ordered, ssd is not
- XML can mix text and elements
- lttalkgt Making Java easier to type and easier
to type - ltspeakergt Phil Wadler lt/speakergt
- lt/talkgt
- XML has lots of other stuff entities, processing
instructions, comments
87XML vs. standard semi-structured data models
- ltperson ido123gt
- ltnamegt Alan lt/namegt
- ltagegt 42 lt/agegt
- ltemailgt ab_at_com lt/emailgt
- lt/persongt
- person o123
- name Alan,
- age 42,
- email ab_at_com
Node labeling
Edge labeling
similar on trees, different on graphs
88(No Transcript)
89XML seen from (R)DBMS world
- RDBMS may want to publish data in XML provide
an XML view of their data - Tagging the output
- Support XML-based querying (which are then
converted to SQL querying) - Single XML-QL query may correspond to a set of
SQL queries - E.g. Schema queries
- SilkRoute, Xperanto systems
- Support XML-based updating
- Tukwila
- RDBMS can be used to provide an efficient storage
for XML files - Efficient indexing/retrieval of path expressions
90Other Important XML Standards
- XSL/XSLT
- presentation and transformation standards
- RDF
- resource description framework (meta-info such as
ratings, categorizations, etc.) - Xpath/Xpointer/Xlink
- standard for linking to documents and elements
within - Namespaces
- for resolving name clashes
- DOM
- Document Object Model for manipulating XML
documents - SAX
- Simple API for XML parsing
91RDF
- http//www.w3.org/TR/REC-rdf-syntax (2/99)
- purpose metadata for Web
- help search engines
- syntax in XML
- semantics edge-labeled graphs
92RDF Metadata standard
- ltrdfDescription aboutwww.mypage.comgt
- ltaboutgt birds, butterflies, snakes
lt/aboutgt - ltauthorgt ltrdfDescriptiongt
- ltfirstnamegt John
lt/firstnamegt - ltlastnamegt Smith
lt/lastnamegt - lt/rdfDescriptiongt
- lt/authorgt
- lt/rdfDescriptiongt
93More RDF Examples
94(No Transcript)
95RDF Terminology
statement
96More RDF Containers
- bag, sequence, alternative
- ltrdfDescriptiongt ltagt ltrdfBaggt
-
ltrdfligt s1 lt/rdfligt -
ltrdfligt s2 lt/rdfligt - lt/rdfBaggt
- lt/agt
- lt/rdfDescriptiongt
97RDF Containers (contd)
a
rdftype
rdf_2
rdf_1
Bag
s1
s2
98More RDF Higher Order Statements
- the author of www.thispage.com says the topic
of www.thatpage.com is environment
RDF uses reification
99(No Transcript)
100XML Parsers
- traditional return data structure (DOM?)
- event based SAX (Simple API for XML)
- http//www.megginson.com/SAX
- write handler for start tag and for end tag
101Need for Ontology standardization
102XML Data Model
- does not exists
- Document Object Model (DOM)
- http//www.w3.org/TR/REC-DOM-Level-1 (10/98)
- class hierarchy (node, element, attribute,)
- objects have behavior
- defines API to inspect/modify the document
103(No Transcript)
104(No Transcript)
105(No Transcript)
106(No Transcript)
107(No Transcript)
108Start of 4/9 lecture
109Querying XML
110XML Data Model (Graph)
Think of the labels as names of binary relations.
- Issues
- distinguish between attributes and
sub-elements? - Should we conserve order?
111Need for XML querying
human-readable documents to retrieve individual
documents, to provide dynamic indexes, to perform
context-sensitive searching, and to generate new
documents. data-oriented documents to query
(virtual) XML representations of databases, to
transform data into new XML representations, and
to integrate data from multiple heterogeneous
data sources. mixed-model documents to perform
queries on documents with embedded data, such as
catalogs, patient health records, employment
records, or business analysis documents.
112Querying XML
- Requirements
- Query a graph, not a relation.
- The result should be a graph (representing an XML
document), not a relation. - No schema.
- We may not know much about the data, so we need
to navigate the XML.
113W3C requirements
- The W3C Query Working Group has identified many
technical requirements - at least one XML syntax at least one
human-readable syntax. - must be declarative
- must be protocol independent
- must respect XML data model
- must be namespace aware
- must coordinate with XML Schema
- must work even if schemas are unavailable
- must support simple and complex datatypes
- must support universal and existential
quantifiers - must support operations on hierarchy and sequence
of document structures - must combine information from multiple documents
- must support aggregation
- must be able to transform and to create XML
structures - must be able to traverse ID references.
114Query Languages
- XML-QL Invented by DB folks
- XML-QL is relational-complete (allows Joins)
- also supports path expressions
- Can extract as well as transform data into
different formats (like XSL) - XML-QL is not in XML syntax
- XSL can also be seen as a query language
- Can transform data
115XML-QL data model
- XML-QL works on an abstraction, called an XML
graph, of the concrete XML document - comments and processing instructions are ignored
- the relative order of elements is ignored
- every node has an ID (autogenerated, if
necessary) - all leaves are character data.
- XML graphs are obtained from XML documents but
are also generated by queries. - A graph is mapped back into an XML document by
choosing arbitrary orderings of element
sequences. - This abstraction is very similar to that from
tables to relations disregard the order of
tuples and attributes.
116Extracting Data by Query
- Matching data using elements patterns.
- WHERE ltbookgt
- ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
- lttitlegt t lt/gt
- ltauthorgt a lt/gt
- lt/bookgt IN www.a.b.c/bib.xml
- CONSTRUCT a
where clause only specifies What must be in
the pattern --pattern can have other stuff
besides what is listed in where
117Constructing XML Data
- WHERE ltbookgt
- ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
- lttitlegt t lt/gt
- ltauthorgt a lt/gt
- lt/gt IN www.a.b.c/bib.xml
- CONSTRUCT ltresultgt
- ltauthorgt a lt/gt
- lttitlegt tlt/gt
- lt/gt
118Grouping with Nested Queries
- WHERE ltbookgt
- lttitlegt t lt/gt,
- ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
- lt/gt CONTENT_AS p IN www.a.b.c/bib.xml
- CONSTRUCT ltresultgt
- lttitregt t lt/gt
- WHERE ltauthorgt a lt/gt IN p
- CONSTRUCT ltauteurgt alt/gt
- lt/gt
119Joining Elements by Value(also integration)
Multiple queries That share values
- WHERE ltarticlegt ltauthorgt
- ltfirstnamegt f lt/gt ltlastnamegt l lt/gt
- lt/gt lt/gt ELEMENT_AS e IN www.a.b.c/artbib.xm
l - ltbook yearygt ltauthorgt
- ltfirstnamegt f lt/gt ltlastnamegt l lt/gt
- lt/gt lt/gt IN www.a.b.c/bookbib.xml , y gt 1995
- CONSTRUCT e
Find all articles whose writers also published a
book after 1995.
120Tag variables (schema queries)
WHERE ltpgt lttitlegt t lt/titlegt
ltyeargt 1995 lt/gt ltegt Smith
lt/gt lt/gt IN "www.a.b.c/bib.xml",
e IN author, editor CONSTRUCT ltpgt
lttitlegt t lt/titlegt
ltegt Smith lt/gt lt/gt
- p matches book and article.
- e matches author and editor.
- this saves us from writing four queries.
- This finds all publications in 1995 where Smith
is either author or editor
121Path Expressions
WHERE ltpartgt ltnamegt r lt/gt ltbrandgt Ford lt/gt
lt/gt IN "www.a.b.c/parts.xml
" CONSTRUCT ltresultgt r lt/gt
Matches any sequence of nodes all of which are
labeled part (can substitute for part in the
above)
WHERE ltpart.(subpartcomponent.piece)gtrlt/gt IN
"www.a.b.c/parts.xml" CONSTRUCT ltresultgt rlt/gt
122Due 30th April
123(No Transcript)
124(No Transcript)