about XML/Xquery/RDF - PowerPoint PPT Presentation

About This Presentation
Title:

about XML/Xquery/RDF

Description:

4/1 about XML/Xquery/RDF HTML vs. XML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 53
Provided by: SubbraoKa3
Category:
Tags: rdf | xml | advanced | dbms | xquery

less

Transcript and Presenter's Notes

Title: about XML/Xquery/RDF


1
4/1
about XML/Xquery/RDF
2
(No Transcript)
3
HTML vs. XML
  • lth1gt Bibliography lt/h1gt
  • ltpgt ltigt Foundations of Databases lt/igt
  • Abiteboul, Hull, Vianu
  • ltbrgt Addison Wesley, 1995
  • ltpgt ltigt Data on the Web lt/igt
  • Abiteoul, Buneman, Suciu
  • ltbrgt Morgan Kaufmann, 1999
  • ltbibliographygt
  • ltbookgt lttitlegt Foundations lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltauthorgt Hull lt/authorgt
  • ltauthorgt Vianu lt/authorgt
  • ltpublishergt Addison Wesley
    lt/publishergt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt
  • lt/bibliographygt

Self-describing -Schema info part of the
data -Good for data exchange (albeit
baroque for storage)
4
lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
5
Why are Database folks so excited about XML?
  • XML is just a syntax for (self-describing) data
  • This is still exciting because
  • No standard syntax for relational data
  • With XML, we can
  • Translate any legacy data to XML
  • Can exchange data in XML format
  • Ship over the web, input to any application

6
XML ? machine accessible meaning
Jim Hendler
This is what a web-page in natural language
looks like for a machine
7
XML ? machine accessible meaning
Jim Hendler
XML allows meaningful tags to be added toparts
of the text
8
XML ? machine accessible meaning
Jim Hendler
But to your machine, the tags look like this.
9
XML ? machine accessible meaning
Jim Hendler
Schemas help.
lt CV gt
by relating common termsbetween documents
private
10
But other people use other schemas
Jim Hendler
Someone else has one like this.
11
But other people use other schemas
Jim Hendler
lt CV gt
which dont fit in
private
Moral There is still need for
ontology mapping..
12
The X-standards
  • XML an on-the-wire representation for data
  • Xquery a query language for XML
  • Xschema a schema description language for XML
    data
  • RDF a language for meta-data description
  • WSDL/SOAP/UDDI languages for describing services

13
XML Terminology
  • tags book, title, author,
  • start tag ltbookgt, end tag lt/bookgt
  • elements ltbookgtltbookgt,ltauthorgtlt/authorgt
  • elements are nested
  • empty element ltredgtlt/redgt abbrv. ltred/gt
  • an XML document single root element

well formed XML document if it has matching tags
14
lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
HTML describes presentation
XML describes content
15
(No Transcript)
16
XML Terminology
  • tags book, title, author,
  • start tag ltbookgt, end tag lt/bookgt
  • elements ltbookgtltbookgt,ltauthorgtlt/authorgt
  • elements are nested
  • empty element ltredgtlt/redgt abbrv. ltred/gt
  • an XML document single root element

well formed XML document if it has matching tags
17
More XML Attributes
  • ltbook price 55 currency USDgt
  • lttitlegt Foundations of Databases lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt

Attributes are single-valued --No
guidance on when to use them
18
More XML Oids and References
Object identifiers
  • ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
  • ltperson ido456gt ltnamegt Mary lt/namegt
  • ltchildren
    idrefo123 o555/gt
  • lt/persongt
  • ltperson ido123 mothero456gtltnamegtJohnlt/namegt
  • lt/persongt

oids and references in XML are just syntax
19
XML vs. Relational Data
  • XML is meant as a language that supports both
    Text and Structured Data
  • Conflicting demands...
  • XML supports semi-structured data
  • In essence, the schema can be union of multiple
    schemas
  • Easy to represent books with or without prices,
    books with any number of authors etc.
  • XML supports free mixing of text and data
  • using the PCDATA type
  • XML is ordered (while relational data is
    unordered)

20
DTDs
Notice that DTD is not In XML syntax ?
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
Semi- structured
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
21
XML Schemas
  • More recent proposal (with XML syntax)
  • unifies previous schema proposals
  • generalizes DTDs
  • uses XML syntax
  • two documents structure and datatypes
  • http//www.w3.org/TR/xmlschema-1
  • http//www.w3.org/TR/xmlschema-2

22
XML Schema
23
RDF Meta-data Standard for Web
  • ltrdfDescription aboutwww.mypage.comgt
  • ltaboutgt birds, butterflies, snakes
    lt/aboutgt
  • ltauthorgt ltrdfDescriptiongt
  • ltfirstnamegt John
    lt/firstnamegt
  • ltlastnamegt Smith
    lt/lastnamegt
  • lt/rdfDescriptiongt
  • lt/authorgt
  • lt/rdfDescriptiongt

Goodol semantic networks..?
24
Querying XML
  • Requirements
  • Need to handle lack of schema.
  • We may not know much about the data, so we need
    to navigate the XML.
  • Need to support both information retrieval and
    SQL-style queries.
  • Ordered vs. un-ordered XML
  • Human readable
  • like SQL? ?
  • Candidates
  • Many based on conflicting requirements
  • XSL Makes IR folks happy
  • XML-QL Makes DB folks happy
  • Xquery W3Cs attempt to make everybody (un)happy

25
Xquery Resources
  • XQuery 1.0 An XML Query Language
  • W3C Working Draft 20 December 2001
  • XML Query Use Cases
  • W3C Working Draft 20 December 2001
  • Microsoft .Net Xquery Language Demo
  • http//131.107.228.20/
  • http//support.x-hive.com/xquery/index.html
  • Supports querying on the documents described in
    the W3C Use Cases
  • Xquery Tutorial by Fankhauser Wadler
  • www.research.avayalabs.com/user/wadler/papers/xque
    ry-tutorial/ xquery-tutorial.pdf

26
FLoWeR Expressions
  • Xquery queries are made up of FLWR expressions
    that work on paths
  • For binds variables to nodes
  • Let computes aggregates
  • Where applies a formula to find matching elements
  • Return constructs the output elements
  • Path expressions are of the form
  • element//element/elementattribvalue

27
Comparison to SQL
  • Look at the use case description on Xquery manual
  • Supports all (?) SQL style queries (with
    different syntax of course) default queries in
    the demo
  • Has support for
  • constructionoutputting the answers in
    arbitrary XML formats (use case XMP )
  • path expressions --- navigating the XML tree
    (use case seq)
  • Simple text queries use case text
  • Allows queries on Tag elements
  • Removes the data/meta-data barrier in queries
  • For each book that has at least one author, list
    the title and first two authors, and an empty
    "et-al" element if the book has additional
    authors. XMP use case 6

28
DTD for http//www.bn.com/bib.xml
  • lt!ELEMENT bib (book )gt
  • lt!ELEMENT book (title, (author editor ),
    publisher, price )gt
  • lt!ATTLIST book year CDATA REQUIRED gt
  • lt!ELEMENT author (last, first )gt
  • lt!ELEMENT editor (last, first, affiliation )gt
  • lt!ELEMENT title (PCDATA )gt
  • lt!ELEMENT last (PCDATA )gt
  • lt!ELEMENT first (PCDATA )gt
  • lt!ELEMENT affiliation (PCDATA )gt
  • lt!ELEMENT publisher (PCDATA )gt
  • lt!ELEMENT price (PCDATA )gt

29
Example Query
Query
Result
  • ltbibgt
  • for b in /bib/book
  • where b/publisher "Addison-Wesley"
  • and b/_at_year gt 1991
  • return ltbook year b/_at_year gt
  • b/title
  • lt/bookgt
  • lt/bibgt
  • For all books after 1991,
  • return with Year changed from
  • a tag to an attribute

ltbibgt ltbook year"1994"gt lttitlegtTCP/IP
Illustratedlt/titlegt lt/bookgt ltbook
year"1992"gt lttitlegtAdvanced Programming in
the Unix environmentlt/titlegt lt/bookgt lt/bibgt
30
Example Query (2)
  • Return the books that cost more at amazon than
    fatbrain
  • Let amazon document(http//www.amazon.com/book
    s.xml),
  • Let fatbrain document(http//www.fatbrain.com/
    books.xml)
  • For am in amazon/books/book,
  • fat in fatbrain/books/book
  • Where am/isbn fat/isbn
  • and am/price gt fat/price
  • Return ltbookgt am/title, am/price, fat/price
    ltbookgt

Join
31
XML frenzy in the DB Community
  • Now that XML is there, what can we do with it?
  • Convert all databases from Relational to XML?
  • Or provide XML views of relational databases?
  • Develop theory of native XML databases?
  • Or assume that XML data will be stored in
    relational databases..
  • Issues What sort of storage mechanisms? What
    sort of indices?

32
XML middleware for Databases
  • XML adapters (middle-ware) received significant
    attention in DB community
  • SilkRoute (ATT)
  • Xperanto (IBM)
  • Issues
  • Need to convert relational data into XML
  • Tagging (easy)
  • Need to convert Xquery queries into equivalent
    SQL queries
  • Trickier as Xquery supports schema querying

33
Dont look beyond this..
34
Xquery Tutorial
  • Craig Knoblock
  • University of Southern California

35
References
  • XQuery 1.0 An XML Query Language
  • W3C Working Draft 20 December 2001
  • XML Query Use Cases
  • W3C Working Draft 20 December 2001
  • Microsoft .Net Xquery Language Demo
  • http//131.107.228.20/
  • Supports querying on the documents described in
    the W3C Use Cases
  • Xquery Tutorial by Fankhauser Wadler
  • www.research.avayalabs.com/user/wadler/papers/xque
    ry-tutorial/ xquery-tutorial.pdf

36
DTD for http//www.bn.com/bib.xml
  • lt!ELEMENT bib (book )gt
  • lt!ELEMENT book (title, (author editor ),
    publisher, price )gt
  • lt!ATTLIST book year CDATA REQUIRED gt
  • lt!ELEMENT author (last, first )gt
  • lt!ELEMENT editor (last, first, affiliation )gt
  • lt!ELEMENT title (PCDATA )gt
  • lt!ELEMENT last (PCDATA )gt
  • lt!ELEMENT first (PCDATA )gt
  • lt!ELEMENT affiliation (PCDATA )gt
  • lt!ELEMENT publisher (PCDATA )gt
  • lt!ELEMENT price (PCDATA )gt

37
Data for www.bn.com/bib.xml
  • ltbibgt
  • ltbook year"1994"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
    horgt
  • ltpublishergtAddison-Wesleylt/publishergt
  • ltpricegt 65.95lt/pricegt
  • lt/bookgt
  • ltbook year"1992"gt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt ltauthorgtltlastgtStevenslt/lastgtltf
    irstgtW.lt/firstgtlt/authorgt
  • ltpublishergtAddison-Wesleylt/publishergt
  • ltpricegt65.95lt/pricegt
  • lt/bookgt

38
Data for www.bn.com/bib.xml (cont.)
  • ltbook year"2000"gt
  • lttitlegtData on the Weblt/titlegt
  • ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
    lt/authorgt
  • ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
    authorgt
  • ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
    orgt
  • ltpublishergtMorgan Kaufmann Publisherslt/publishergt
  • ltpricegt 39.95lt/pricegt
  • lt/bookgt
  • ltbook year"1999"gt
  • lttitlegtThe Economics of Technology and Content
    for Digital TVlt/titlegt
  • lteditorgt ltlastgtGerbarglt/lastgtltfirstgtDarcylt/firstgt
    ltaffiliationgtCITIlt/affiliationgt lt/editorgt
  • ltpublishergtKluwer Academic Publisherslt/publishergt
  • ltpricegt129.95lt/pricegt
  • lt/bookgt
  • lt/bibgt

39
Document References
  • Document can either be referenced explicitly or
    in the default namespace
  • In the Microsoft Demo
  • /Bib document("http//www.bn.com/bib.xml")/bib
  • We will use /bib throughout, but you must use the
    expansion to run the demo
  • In Theseus the document for xquery is passed as
    input

40
Projection
  • Return the names of all authors of books
  • /bib/book/author
  • ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
    horgt
  • ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
    horgt
  • ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
    lt/authorgt
  • ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
    authorgt
  • ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
    orgt

41
Project (cont.)
  • The same query can also be written as a for loop
  • /bib/book/author
  • for bk in /bib/book return
  • for aut in bk/author return aut
  • ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
    horgt
  • ltauthorgtltlastgtStevenslt/lastgtltfirstgtW.lt/firstgtlt/aut
    horgt
  • ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
    lt/authorgt
  • ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
    authorgt
  • ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
    orgt

42
Selection
  • Return the titles of all books published before
    1997
  • /bib/book_at_year lt "1997"/title
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt

43
Selection (cont.)
  • Return the titles of all books published before
    1997
  • /bib/book_at_year lt "1997"/title
  • for bk in /bib/book
  • where bk/_at_year lt "1997"
  • return bk/title
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt

44
Selection (cont.)
  • Return book with the title Data on the Web
  • /bib/booktitle "Data on the Web"
  • ltbook year"2000"gt
  • lttitlegtData on the Weblt/titlegt
  • ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
    lt/authorgt
  • ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
    authorgt
  • ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
    orgt
  • ltpublishergtMorgan Kaufmann Publisherslt/publishergt
  • ltpricegt 39.95lt/pricegt
  • lt/bookgt

45
Selection (cont.)
  • Return the price of the book Data on the Web
  • /bib/booktitle "Data on the Web"/price
  • ltpricegt 39.95lt/pricegt
  • How would you return the book with a price of
    39.95?

46
Selection (cont.)
  • Return the book with a price of 39.95
  • for bk in /bib/book
  • where bk/price " 39.95"
  • return bk
  • ltbook year"2000"gt
  • lttitlegtData on the Weblt/titlegt
  • ltauthorgtltlastgtAbiteboullt/lastgtltfirstgtSergelt/firstgt
    lt/authorgt
  • ltauthorgtltlastgtBunemanlt/lastgtltfirstgtPeterlt/firstgtlt/
    authorgt
  • ltauthorgtltlastgtSuciult/lastgtltfirstgtDanlt/firstgtlt/auth
    orgt
  • ltpublishergtMorgan Kaufmann Publisherslt/publishergt
  • ltpricegt 39.95lt/pricegt
  • lt/bookgt

47
Construction
  • Return year and title of all books published
    before 1997
  • for bk in /bib/book
  • where bk/_at_year lt "1997"
  • return ltbookgt bk/_at_year, bk/title lt/bookgt
  • ltbook year"1994"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lt/bookgt
  • ltbook year"1992"gt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • lt/bookgt

48
Grouping
  • Return titles for each author
  • for author in distinct(/bib/book/author/last)
    return
  • ltauthor name author/text() gt
  • /bib/bookauthor/last author/title
  • lt/authorgt
  • ltauthor name"Stevens"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • lt/authorgt
  • ltauthor name"Abiteboul"gt
  • lttitlegtData on the Weblt/titlegt
  • lt/authorgt

49
Join
  • Return the books that cost more at amazon than
    fatbrain
  • Let amazon document(http//www.amazon.com/book
    s.xml),
  • Let fatbrain document(http//www.fatbrain.com/
    books.xml)
  • For am in amazon/books/book,
  • fat in fatbrain/books/book
  • Where am/isbn fat/isbn
  • and am/price gt fat/price
  • Return ltbookgt am/title, am/price, fat/price
    ltbookgt

50
Example Query 1
  • ltbibgt
  • for b in /bib/book
  • where b/publisher "Addison-Wesley" and
    b/_at_year gt 1991
  • return ltbook year b/_at_year gt
  • b/title
  • lt/bookgt
  • lt/bibgt
  • What does this do?

51
Result Query 1
  • ltbibgt
  • ltbook year"1994"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lt/bookgt
  • ltbook year"1992"gt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • lt/bookgt
  • lt/bibgt

52
Example Query 2
  • ltresultsgt
  • for b in document("http//www.bn.com/bib.xml")/
    bib/book,
  • t in b/title,
  • a in b/author
  • return
  • ltresultgt
  • t
  • a
  • lt/resultgt
  • lt/resultsgt

53
Result Query 2
  • ltresultsgt
  • ltresultgtlttitlegtTCP/IP Illustratedlt/titlegt
  • ltlastgtStevens lt/lastgt
  • lt/resultgt
  • ltresultgtlttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • ltlastgtStevenslt/lastgt
  • lt/resultgt
  • ltresultgtlttitlegtData on the Weblt/titlegt
  • ltlastgtAbiteboullt/lastgt
  • lt/resultgt
  • ltresultgt lttitlegtData on the Weblt/titlegt
  • ltlastgtBunemanlt/lastgt
  • lt/resultgt
  • ltresultgtlttitlegtData on the Weblt/titlegt
  • ltlastgtSuciult/lastgt
  • lt/resultgt
  • lt/resultsgt

54
Example Query 3
  • ltbooks-with-pricesgt
  • for b in document("http//www.bn.com/bib.xml"
    )//book,
  • a in document("http//www.amazon.com/revi
    ews.xml")//entry
  • where b/title a/title
  • return
  • ltbook-with-pricesgt
  • b/title
  • ltprice-amazongt a/price/text()
    lt/price-amazongt
  • ltprice-bngt b/price/text()
    lt/price-bngt
  • lt/book-with-pricesgt
  • lt/books-with-pricesgt

55
Result Query 3
  • ltbooks-with-pricesgt
  • ltbook-with-pricesgt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • ltprice-amazongt65.95lt/price-amazongt
  • ltprice-bngt 65.95lt/price-bngt
  • lt/book-with-pricesgt
  • ltbook-with-pricesgt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • ltprice-amazongt65.95lt/price-amazongt
  • ltprice-bngt65.95lt/price-bngt
  • lt/book-with-pricesgt
  • ltbook-with-pricesgt
  • lttitlegtData on the Web lt/titlegt
  • ltprice-amazongt34.95lt/price-amazongt
  • ltprice-bngt 39.95lt/price-bngt
  • lt/book-with-pricesgt
  • lt/books-with-pricesgt

56
Example Query 4
  • ltbibgt
  • for b in document("www.bn.com/bib.xml")//book
  • where b/publisher "Addison-Wesley" and
    b/_at_year gt "1991"
  • return ltbookgt b/_at_year b/title
    lt/bookgt
  • sortby (title)
  • lt/bibgt

57
Example Result 4
  • ltbibgt
  • ltbook year"1992"gt
  • lttitlegtAdvanced Programming in the Unix
    environmentlt/titlegt
  • lt/bookgt
  • ltbook year"1994"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • lt/bookgt
  • lt/bibgt

58
Impact of XML on Integration
  • If and when all sources accept Xqueries and
    exchange data in XML format, then
  • Mediator can accept user queries in Xquery
  • Access sources using Xquery
  • Get data back in XML format
  • Merge results and send to user in XML format
  • How about now?
  • Sources can use XML adapters (middle-ware)

59
Is XML standardization a magical solution for
Integration?
  • If all WEB sources standardize into XML format
  • Source access (wrapper generation issues) become
    easier to manage
  • BUT all other problems remain
  • Still need to relate source (XML)schemas to
    mediator (XML)schema
  • Still need to reason about source overlap, source
    access limitations etc.
  • Still need to manage execution in the presence of
    source/network uncertainities

60
Semantic Web
  • The LAV/GAV approaches assume that some human
    expert will do the actual schema mapping
  • The semantic-web initiative attempts to
    automate schema mapping
  • Idea Allow pages to write logical axioms
    relating their vocabulary (tags) to other
    external tags
  • Support automatic inference of relations between
    source and mediator schema using these rules
  • DAMLOIL

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
Data Model
69
Which will have XML Syntax
70
Document Type Definition DTD
  • part of the original XML specification
  • an XML document may have a DTD
  • terminology for XML
  • well-formed if tags are correctly closed
  • valid if it has a DTD and conforms to it
  • validation is useful in data exchange

71
Notice that DTD is not In XML syntax ?
72
Two ways to specify a DTD
  • External DTD
  • Internal

lt?xml version"1.0"?gt lt!DOCTYPE greeting SYSTEM
"hello.dtd"gt ltgreetinggtHello, world!lt/greetinggt
lt?xml version"1.0" encoding"UTF-8" ?gt lt!DOCTYPE
greeting lt!ELEMENT greeting
(PCDATA)gt gt ltgreetinggtHello,
world!lt/greetinggt
73
(No Transcript)
74
DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
75
(No Transcript)
76
(No Transcript)
77
Shortcomings of DTDs
  • Useful for documents, but not so good for data
  • No support for structural re-use
  • Object-oriented-like structures arent supported
  • No support for data types
  • Cant do data validation
  • Can have a single key item (ID), but
  • No support for multi-attribute keys
  • No support for foreign keys (references to other
    keys)
  • No constraints on IDREFs (reference only a
    Section)

78
XML Schema
  • In XML format
  • Includes primitive data types (integers, strings,
    dates, etc.)
  • Supports value-based constraints (integers gt 100)
  • User-definable structured types
  • Inheritance (extension or restriction)
  • Foreign keys
  • Element-type reference constraints

79
XML Schemas
Pre-specified tags
  • ltelementType namepapergt
  • ltsequencegt
  • ltelementTypeRef nametitle/gt
  • ltelementTypeRef nameauthor
    minOccurs0/gt
  • ltelementTypeRef nameyear/gt
  • ltchoicegt ltelementTypeRef
    namejournal/gt
  • ltelementTypeRef
    nameconference/gt
  • lt/choicegt
  • lt/sequencegt
  • lt/elementTypegt

How many different RDBMS Schemas are needed here?
DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
80
Sample XML Schema
  • ltschema version1.0 xmlnshttp//www.w3.org/199
    9/XMLSchemagt
  • ltelement nameauthor typestring /gt
  • ltelement namedate type date /gt
  • ltelement nameabstractgt
  • lttypegt
  • lt/typegt
  • lt/elementgt
  • ltelement namepapergt
  • lttypegt
  • ltattribute namekeywords typestring/gt
  • ltelement refauthor minOccurs0
    maxOccurs /gt
  • ltelement refdate /gt
  • ltelement refabstract minOccurs0
    maxOccurs1 /gt
  • ltelement refbody /gt
  • lt/typegt
  • lt/elementgt
  • lt/schemagt

81
Subtyping in XML Schema
  • ltschema version1.0 xmlnshttp//www.w3.org/199
    9/XMLSchemagt
  • lttype namepersongt
  • ltattribute namessngt
  • ltelement nametitle minOccurs0
    maxOccurs1 /gt
  • ltelement namesurname /gt
  • ltelement nameforename minOccurs0
    maxOccurs /gt
  • lt/typegt
  • lttype nameextended sourceperson
    derivedByextensiongt
  • ltelement namegeneration minOccurs0 /gt
  • lt/typegt
  • lttype namenotitle sourceperson
    derivedByrestrictiongt
  • ltelement nametitle maxOccurs0 /gt
  • lt/typegt
  • ltkey namepersonKeygt
  • ltselectorgt.//person_at_ssnlt/selectorgt
  • ltfieldgt_at_ssnlt/fieldgt
  • lt/keygt
  • lt/schemagt

82
DTDs as Schemas
  • Not so well suited
  • impose unwanted constraints on order
    lt!ELEMENT person (name,phone)gt
  • references cannot be constrained
  • can be too vague
  • lt!ELEMENT person ((namephoneemail))gt
  • Union of schemas..?

83
XML Schemas
  • recent proposal
  • unifies previous schema proposals
  • generalizes DTDs
  • uses XML syntax
  • two documents structure and datatypes
  • http//www.w3.org/TR/xmlschema-1
  • http//www.w3.org/TR/xmlschema-2

84
Although DB folks have several beefs
Give me the names of people who are Listed either
as editor or author of a book
85
(No Transcript)
86
Differences between XML and SSD
  • Pure SSD uses edge-labeled graphs as data model
  • XML is ordered, ssd is not
  • XML can mix text and elements
  • lttalkgt Making Java easier to type and easier
    to type
  • ltspeakergt Phil Wadler lt/speakergt
  • lt/talkgt
  • XML has lots of other stuff entities, processing
    instructions, comments

87
XML vs. standard semi-structured data models
  • ltperson ido123gt
  • ltnamegt Alan lt/namegt
  • ltagegt 42 lt/agegt
  • ltemailgt ab_at_com lt/emailgt
  • lt/persongt
  • person o123
  • name Alan,
  • age 42,
  • email ab_at_com

Node labeling
Edge labeling
similar on trees, different on graphs
88
(No Transcript)
89
XML seen from (R)DBMS world
  • RDBMS may want to publish data in XML provide
    an XML view of their data
  • Tagging the output
  • Support XML-based querying (which are then
    converted to SQL querying)
  • Single XML-QL query may correspond to a set of
    SQL queries
  • E.g. Schema queries
  • SilkRoute, Xperanto systems
  • Support XML-based updating
  • Tukwila
  • RDBMS can be used to provide an efficient storage
    for XML files
  • Efficient indexing/retrieval of path expressions

90
Other Important XML Standards
  • XSL/XSLT
  • presentation and transformation standards
  • RDF
  • resource description framework (meta-info such as
    ratings, categorizations, etc.)
  • Xpath/Xpointer/Xlink
  • standard for linking to documents and elements
    within
  • Namespaces
  • for resolving name clashes
  • DOM
  • Document Object Model for manipulating XML
    documents
  • SAX
  • Simple API for XML parsing

91
RDF
  • http//www.w3.org/TR/REC-rdf-syntax (2/99)
  • purpose metadata for Web
  • help search engines
  • syntax in XML
  • semantics edge-labeled graphs

92
RDF Metadata standard
  • ltrdfDescription aboutwww.mypage.comgt
  • ltaboutgt birds, butterflies, snakes
    lt/aboutgt
  • ltauthorgt ltrdfDescriptiongt
  • ltfirstnamegt John
    lt/firstnamegt
  • ltlastnamegt Smith
    lt/lastnamegt
  • lt/rdfDescriptiongt
  • lt/authorgt
  • lt/rdfDescriptiongt

93
More RDF Examples
94
(No Transcript)
95
RDF Terminology
statement
96
More RDF Containers
  • bag, sequence, alternative
  • ltrdfDescriptiongt ltagt ltrdfBaggt

  • ltrdfligt s1 lt/rdfligt

  • ltrdfligt s2 lt/rdfligt
  • lt/rdfBaggt
  • lt/agt
  • lt/rdfDescriptiongt

97
RDF Containers (contd)
a
rdftype
rdf_2
rdf_1
Bag
s1
s2
98
More RDF Higher Order Statements
  • the author of www.thispage.com says the topic
    of www.thatpage.com is environment

RDF uses reification
99
(No Transcript)
100
XML Parsers
  • traditional return data structure (DOM?)
  • event based SAX (Simple API for XML)
  • http//www.megginson.com/SAX
  • write handler for start tag and for end tag

101
Need for Ontology standardization
102
XML Data Model
  • does not exists
  • Document Object Model (DOM)
  • http//www.w3.org/TR/REC-DOM-Level-1 (10/98)
  • class hierarchy (node, element, attribute,)
  • objects have behavior
  • defines API to inspect/modify the document

103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
Start of 4/9 lecture
109
Querying XML
110
XML Data Model (Graph)
Think of the labels as names of binary relations.
  • Issues
  • distinguish between attributes and
    sub-elements?
  • Should we conserve order?

111
Need for XML querying
human-readable documents to retrieve individual
documents, to provide dynamic indexes, to perform
context-sensitive searching, and to generate new
documents. data-oriented documents to query
(virtual) XML representations of databases, to
transform data into new XML representations, and
to integrate data from multiple heterogeneous
data sources. mixed-model documents to perform
queries on documents with embedded data, such as
catalogs, patient health records, employment
records, or business analysis documents.
112
Querying XML
  • Requirements
  • Query a graph, not a relation.
  • The result should be a graph (representing an XML
    document), not a relation.
  • No schema.
  • We may not know much about the data, so we need
    to navigate the XML.

113
W3C requirements
  • The W3C Query Working Group has identified many
    technical requirements
  • at least one XML syntax at least one
    human-readable syntax.
  • must be declarative
  • must be protocol independent
  • must respect XML data model
  • must be namespace aware
  • must coordinate with XML Schema
  • must work even if schemas are unavailable
  • must support simple and complex datatypes
  • must support universal and existential
    quantifiers
  • must support operations on hierarchy and sequence
    of document structures
  • must combine information from multiple documents
  • must support aggregation
  • must be able to transform and to create XML
    structures
  • must be able to traverse ID references.

114
Query Languages
  • XML-QL Invented by DB folks
  • XML-QL is relational-complete (allows Joins)
  • also supports path expressions
  • Can extract as well as transform data into
    different formats (like XSL)
  • XML-QL is not in XML syntax
  • XSL can also be seen as a query language
  • Can transform data

115
XML-QL data model
  • XML-QL works on an abstraction, called an XML
    graph, of the concrete XML document
  • comments and processing instructions are ignored
  • the relative order of elements is ignored
  • every node has an ID (autogenerated, if
    necessary)
  • all leaves are character data.
  • XML graphs are obtained from XML documents but
    are also generated by queries.
  • A graph is mapped back into an XML document by
    choosing arbitrary orderings of element
    sequences.
  • This abstraction is very similar to that from
    tables to relations disregard the order of
    tuples and attributes.

116
Extracting Data by Query
  • Matching data using elements patterns.
  • WHERE ltbookgt
  • ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
  • lttitlegt t lt/gt
  • ltauthorgt a lt/gt
  • lt/bookgt IN www.a.b.c/bib.xml
  • CONSTRUCT a

where clause only specifies What must be in
the pattern --pattern can have other stuff
besides what is listed in where
117
Constructing XML Data
  • WHERE ltbookgt
  • ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
  • lttitlegt t lt/gt
  • ltauthorgt a lt/gt
  • lt/gt IN www.a.b.c/bib.xml
  • CONSTRUCT ltresultgt
  • ltauthorgt a lt/gt
  • lttitlegt tlt/gt
  • lt/gt

118
Grouping with Nested Queries
  • WHERE ltbookgt
  • lttitlegt t lt/gt,
  • ltpublishergtltnamegtAddison-Wesleylt/gtlt/gt
  • lt/gt CONTENT_AS p IN www.a.b.c/bib.xml
  • CONSTRUCT ltresultgt
  • lttitregt t lt/gt
  • WHERE ltauthorgt a lt/gt IN p
  • CONSTRUCT ltauteurgt alt/gt
  • lt/gt


119
Joining Elements by Value(also integration)
Multiple queries That share values
  • WHERE ltarticlegt ltauthorgt
  • ltfirstnamegt f lt/gt ltlastnamegt l lt/gt
  • lt/gt lt/gt ELEMENT_AS e IN www.a.b.c/artbib.xm
    l
  • ltbook yearygt ltauthorgt
  • ltfirstnamegt f lt/gt ltlastnamegt l lt/gt
  • lt/gt lt/gt IN www.a.b.c/bookbib.xml , y gt 1995
  • CONSTRUCT e

Find all articles whose writers also published a
book after 1995.
120
Tag variables (schema queries)
  WHERE ltpgt            lttitlegt t lt/titlegt
           ltyeargt 1995 lt/gt            ltegt Smith
lt/gt          lt/gt IN "www.a.b.c/bib.xml",        
e IN author, editor   CONSTRUCT ltpgt
               lttitlegt t lt/titlegt               
ltegt Smith lt/gt              lt/gt
  • p matches book and article.
  • e matches author and editor.
  • this saves us from writing four queries.
  • This finds all publications in 1995 where Smith
    is either author or editor

121
Path Expressions
  WHERE ltpartgt ltnamegt r lt/gt ltbrandgt Ford lt/gt
lt/gt IN "www.a.b.c/parts.xml
"    CONSTRUCT ltresultgt r lt/gt
Matches any sequence of nodes all of which are
labeled part (can substitute for part in the
above)
  WHERE ltpart.(subpartcomponent.piece)gtrlt/gt IN
"www.a.b.c/parts.xml"    CONSTRUCT ltresultgt rlt/gt
122
Due 30th April
123
(No Transcript)
124
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com