From Semistructured Data to XML - PowerPoint PPT Presentation

About This Presentation
Title:

From Semistructured Data to XML

Description:

Document Object Model (DOM): http://www.w3.org/TR/REC-DOM-Level-1 (10/98) ... traditional: return data structure (DOM?) event based: SAX (Simple API for XML) ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 119
Provided by: csWash
Category:
Tags: xml | data | dom | semistructured

less

Transcript and Presenter's Notes

Title: From Semistructured Data to XML


1
From Semistructured Data to XML
  • Dan Suciu
  • ATT Labs
  • http//www.research.att.com/suciu/vldb99-tutorial
    .pdf

2
How the Web is Today
  • HTML documents
  • all intended for human consumption
  • many generated automatically by applications

Easy to fetch any Web page, from any server, any
platform
3
Limits of the Web Today
  • application cannot consume HTML
  • HTML wrapper technology is brittle
  • screen scraping
  • OO technology (Corba) requires controlled
    environment
  • companies merge, form partnerships need
    interoperability fast

people are inventive send data by fax !
4
Paradigm Shift on the Web
  • new Web standard XML
  • XML generated by applications
  • XML consumed by applications
  • data exchange
  • across platforms enterprise interoperability
  • across enterprises

Web from collection of documents to data and
documents
5
Database Community Can Help
  • query optimization, processing
  • views, transformations
  • data warehouses, data integration
  • mediators, query rewriting
  • secondary storage, indexes

6
But Needs a Paradigm Shift Too
  • Web data differs from database data
  • self-describing, schema-less
  • structure changes without notice
  • heterogeneous, deeply nested, irregular
  • documents and data mixed together
  • designed by document, not db experts
  • need Web data management

7
What This Tutorial is About
  • what the database community has done
  • semistructured data model
  • query languages, schemas
  • what the Web community has done
  • data formats/models XML, RDF
  • transformation language (XSL), schemas
  • where they meet and where they differ

8
Outline
  • Semistructured data and XML
  • Query languages
  • Schemas
  • Systems issues
  • Conclusions

9
Part 1Semistructured Data and XML
10
Semistructured Data
  • Origins
  • integration of heterogeneous sources
  • data sources with non-rigid structure
  • biological data
  • Web data

11
The Semistructured Data Model
Bib
o1
complex object
paper
paper
book
references
o12
o24
o29
references
references
author
page
author
year
author
title
http
title
title
publisher
author
author
author
o43
25
96
1997
last
firstname
atomic object
firstname
lastname
first
lastname
243
206
Serge
Abiteboul
Victor
122
133
Vianu
Object Exchange Model (OEM)
12
Syntax for Semistructured Data
  • Bib o1 paper o12 ,
  • book o24 ,
  • paper o29
  • author o52
    Abiteboul,
  • author o96
    firstname 243 Victor,

  • lastname o206 Vianu,
  • title o93 Regular
    path queries with constraints,
  • references o12,
  • references o24,
  • pages o25 first
    o64 122, last o92 133

13
Syntax for Semistructured Data
  • May omit oids
  • paper author Abiteboul,
  • author firstname Victor,
  • lastname
    Vianu,
  • title Regular path queries
    ,
  • page first 122, last 133

14
Characteristics of Semistructured Data
  • missing or additional attributes
  • multiple attributes
  • different types in different objects
  • heterogeneous collections

self-describing, irregular data, no a priori
structure
15
Comparison with Relational Data
  • row name John, phone 3634 ,
  • row name Sue, phone 6343 ,
  • row name Dick, phone 6363

16
XML
  • a W3C standard to complement HTML
  • origins structured text SGML
  • motivation
  • HTML describes presentation
  • XML describes content
  • http//www.w3.org/TR/REC-xml (2/98)

17
From HTML to XML
HTML describes the presentation
18
HTML
  • lth1gt Bibliography lt/h1gt
  • ltpgt ltigt Foundations of Databases lt/igt
  • Abiteboul, Hull, Vianu
  • ltbrgt Addison Wesley, 1995
  • ltpgt ltigt Data on the Web lt/igt
  • Abiteoul, Buneman, Suciu
  • ltbrgt Morgan Kaufmann, 1999

19
XML
  • ltbibliographygt
  • ltbookgt lttitlegt Foundations lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltauthorgt Hull lt/authorgt
  • ltauthorgt Vianu lt/authorgt
  • ltpublishergt Addison Wesley
    lt/publishergt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt
  • lt/bibliographygt

XML describes the content
20
XML Terminology
  • tags book, title, author,
  • start tag ltbookgt, end tag lt/bookgt
  • elements ltbookgtltbookgt,ltauthorgtlt/authorgt
  • elements are nested
  • empty element ltredgtlt/redgt abbrv. ltred/gt
  • an XML document single root element

well formed XML document if it has matching tags
21
More XML Attributes
  • ltbook price 55 currency USDgt
  • lttitlegt Foundations of Databases lt/titlegt
  • ltauthorgt Abiteboul lt/authorgt
  • ltyeargt 1995 lt/yeargt
  • lt/bookgt

attributes are alternative ways to represent data
22
More XML Oids and References
  • ltperson ido555gt ltnamegt Jane lt/namegt lt/persongt
  • ltperson ido456gt ltnamegt Mary lt/namegt
  • ltchildren
    idrefo123 o555/gt
  • lt/persongt
  • ltperson ido123 mothero456gtltnamegtJohnlt/namegt
  • lt/persongt

oids and references in XML are just syntax
23
XML Data Model
  • does not exists
  • Document Object Model (DOM)
  • http//www.w3.org/TR/REC-DOM-Level-1 (10/98)
  • class hierarchy (node, element, attribute,)
  • objects have behavior
  • defines API to inspect/modify the document

24
XML Parsers
  • traditional return data structure (DOM?)
  • event based SAX (Simple API for XML)
  • http//www.megginson.com/SAX
  • write handler for start tag and for end tag

25
XML Namespaces
  • http//www.w3.org/TR/REC-xml-names (1/99)
  • name prefixlocalpart
  • ltbook xmlnsisbnwww.isbn-org.org/defgt
  • lttitlegt lt/titlegt
  • ltnumbergt 15 lt/numbergt
  • ltisbnnumbergt . lt/isbnnumbergt
  • lt/bookgt

26
XML Namespaces
  • syntactic ltnumbergt , ltisbnnumbergt
  • semantic provide URL for schema
  • lttag xmlnsmystyle http//gt
  • ltmystyletitlegt
    lt/mystyletitlegt
  • ltmystylenumbergt
  • lt/taggt

27
XML v.s. Semistructured Data
  • both described best by a graph
  • both are schema-less, self-describing

28
Similarities and Differences
  • ltperson ido123gt
  • ltnamegt Alan lt/namegt
  • ltagegt 42 lt/agegt
  • ltemailgt ab_at_com lt/emailgt
  • lt/persongt
  • person o123
  • name Alan,
  • age 42,
  • email ab_at_com

similar on trees, different on graphs
29
More Differences
  • XML is ordered, ssd is not
  • XML can mix text and elements
  • lttalkgt Making Java easier to type and easier
    to type
  • ltspeakergt Phil Wadler lt/speakergt
  • lt/talkgt
  • XML has lots of other stuff entities, processing
    instructions, comments

30
RDF
  • http//www.w3.org/TR/REC-rdf-syntax (2/99)
  • purpose metadata for Web
  • help search engines
  • syntax in XML
  • semantics edge-labeled graphs

31
RDF Syntax
  • ltrdfDescription aboutwww.mypage.comgt
  • ltaboutgt birds, butterflies, snakes
    lt/aboutgt
  • ltauthorgt ltrdfDescriptiongt
  • ltfirstnamegt John
    lt/firstnamegt
  • ltlastnamegt Smith
    lt/lastnamegt
  • lt/rdfDescriptiongt
  • lt/authorgt
  • lt/rdfDescriptiongt

32
RDF Data Model
www.mypage.com
about
author
birds, butterflies, snakes
firstname
lastname
John
Smith
the RDF Data Model is very close to
semistructured data
33
More RDF Examples
related
www.mypage.com
www.anotherpage.com
about
author
author
author
birds, butterflies, snakes
Joe Doe
firstname
lastname
John
Smith
34
ltrdfDescription aboutwww.mypage.comgt
ltaboutgt birds, butterflies, snakes lt/aboutgt
ltauthorgt ltrdfDescription IDo55gt
ltfirstnamegt John lt/firstnamegt
ltlastnamegt Smith
lt/lastnamegt
lt/rdfDescriptiongt lt/authorgt lt/rdfDescriptiongt
ltrdfDescription aboutwww.anotherpage.comgt
ltrelatedgt ltrdfDescription
aboutwww.mypage.com/gt lt/relatedgt
ltauthor rdfresourceo55/gt
ltauthorgt Joe Doe lt/authorgt lt/rdfDescriptiongt
35
RDF Terminology
statement
36
More RDF Containers
  • bag, sequence, alternative
  • ltrdfDescriptiongt ltagt ltrdfBaggt

  • ltrdfligt s1 lt/rdfligt

  • ltrdfligt s2 lt/rdfligt
  • lt/rdfBaggt
  • lt/agt
  • lt/rdfDescriptiongt

37
RDF Containers (contd)
a
rdftype
rdf_2
rdf_1
Bag
s1
s2
38
More RDF Higher Order Statements
  • the author of www.thispage.com says the topic
    of www.thatpage.com is environment

RDF uses reification
39
Summary of Data Models
  • semistructured data, XML, RDF
  • data is self-describing, irregular
  • schema embedded in the data

40
Part 2Query Languages
  • Semistructured data and XML
  • Query languages
  • Schemas
  • Systems issues
  • Conclusions

41
Query Languages Motivation
  • granularity of the HTML Web one file
  • granularity of Web data varies
  • single data item get Johns salary
  • entire database get all salaries
  • aggregates get average salary
  • need query language to define granularity

42
Query Languages Outline
  • for semistructured data
  • Lorel
  • UnQL
  • StruQL
  • for XML XML-QL
  • a different paradigm
  • structural recursion
  • XSL

43
Lorel
  • part of the Lore system (Stanford)
  • adapts OQL to semistructured data

select X.title from Bib.paper X where X.year gt
1995
example
select Bib.paper.title from Bib.paper where
Bib.paper.year gt 1995
abbreviated to
44
Lorel v.s. OQL
  • implicit coercions 1995 to 1995
  • missing attributes
  • empty answer v.s. type error
  • set-valued attributes
  • in X.yeargt1995, X may have several years
  • regular path expressions (next)

45
Regular Path Expressions
select X.title from Bib.paper X,
Bib.(paperbook) Y where Y.author.lastname?
Ullman and Y.reference X
  • Useful for
  • syntactic substitute for inheritance paperbook
  • navigating partially known structures lastname?
  • transitive closure reference

46
UnQL
  • Unstructured Query Language
  • patterns, templates, structural recursion
  • patterns

select T where Bib.paper title T, year Y,
journal TODS and Y gt 1995
47
UnQL Templates
select result fn F, ln L, pub title T,
year Y where Bib.paper title T, year Y,
journal TODS and Y gt 1995
Result looks like result fn John, ln
Smith, pub title P equals
NP, year 2005, result fn Joe, ln
Doe, pub title Errata to
PNP, year 2006
48
Skolem Functions
  • Maier, 1986
  • in OO systems
  • Kifer et al, 1989
  • F-logic
  • Hull and Yoshikawa, 1990
  • deductive db (ILOG)
  • Papakonstantinou et al., 1996
  • semistructured db (MSL)
  • illustrate with Strudel (next)

49
Skolem Functions in StruQL
  • Strudel a Web Site Management System
  • StruQL its query language

50
Example Bibliography Data
  • Bib paper author Jones,
  • author Smith,
  • title The Comma,
  • year 1994
  • ,
  • paper ..

51
Example A Complex Web Site
Root()
52
Example Skolem Functions in StruQL
where Root -gt Bib -gt X, X -gt paper -gt P,
P -gt author -gt A, P -gt title -gt T, P -gt
year -gt Y create Root(), HomePage(A),
YearPage(A,Y), PubPage(P) link Root() -gt
person -gt HomePage(A), HomePage(A)
-gt yearentry -gt YearPage(A,Y),
YearPage(A,Y) -gt publication -gt PubPage(P),
PubPage(P) -gt author -gt HomePage(A),
PubPage(P) -gt title -gt T
53
XML-QL A Query Language for XML
  • http//www.w3.org/TR/NOTE-xml-ql (8/98)
  • features
  • regular path expressions
  • patterns, templates
  • Skolem Functions
  • based on OEM data model

54
Pattern Matching in XML-QL
where ltbook languagefrenchgt
ltpublishergt ltnamegt
Morgan Kaufmann lt/namegt
lt/publishergt ltauthorgt a
lt/authorgt lt/bookgt in
www.a.b.c/bib.xml construct a
55
Simple Constructors in XML-QL
where ltbook language lgt
ltauthorgt a lt/gt lt/gt in
www.a.b.c/bib.xml construct ltresultgt ltauthorgt
a lt/gt ltlanggt l lt/gt lt/gt
  • Note lt/gt abbreviates lt/bookgt or lt/resultgt or ...

ltresultgt ltauthorgtSmithlt/authorgtltlanggtEnglishlt/lang
gtlt/resultgt ltresultgt ltauthorgtSmithlt/authorgtltlanggtMa
ndarinlt/langgtlt/resultgt ltresultgt
ltauthorgtDoelt/authorgtltlanggtEnglishlt/langgtlt/resultgt
56
Skolem Functions in XML-QL
where ltbook language lgt ltauthorgt
a lt/gt lt/gt in www.a.b.c/bib.xml const
ruct ltresultgt ltauthor idF(a)gt alt/gt
ltlanggt l lt/gt
lt/gt
ltresultgt ltauthorgtSmithlt/authorgt
ltlanggtEnglishlt/langgt ltlanggtMandarinlt/langgt
lt/resultgt ltresultgt ltauthorgtDoelt/authorgt
ltlanggtEnglishlt/langgt lt/resultgt
57
A Different Paradigm Structural Recursion
  • Data as sets with a union operator
  • a3, abone, c5, b4
  • a3 U abone,c5 U b4

58
Structural Recursion
  • Example retrieve all integers in the data

f(T1 U T2) f(T1) U f(T2) f(L T)
f(T) f() f(V) if
isInt(V) then result V else
standard textbook programming on trees
59
Structural Recursion
  • Example increase all engine prices by 10

60
XSL
  • two W3C drafts XSLT and XPATH
  • http//www.w3.org/TR/xpath, 7/99
  • http//www.w3.org/TR/WD-xslt, 7/99
  • in commercial products (e.g. IE5.0)
  • purpose stylesheet specification language
  • stylesheet XML -gt HTML
  • in general XML -gt XML

61
XSL Templates and Rules
  • query collection of template rules
  • template rule match pattern template

62
XPath Expressions in Match Patterns
  • bib matches a bib element
  • matches any element
  • / matches the root element
  • /bib matches a bib element under root
  • bib/paper matches a paper in bib
  • bib//paper matches a paper in bib, at any depth
  • //paper matches a paper at any depth
  • paperbook matches a paper or a book
  • _at_price matches a price attribute
  • bib/book/_at_price matches price attribute in book,
    in bib

63
Flow Control in XSL
ltxsltemplategt ltxslapply-templates/gt
lt/xsltemplategt ltxsltemplate matchagt
ltAgtltxslapply-templates/gtlt/Agt lt/xsltemplategt ltxs
ltemplate matchbgt ltBgtltxslapply-templates/gtlt/
Bgt lt/xsltemplategt ltxsltemplate matchcgt
ltCgtltxslvalue-of/gtlt/Cgt lt/xsltemplategt
64
  • ltagt ltegt ltbgt ltcgt 1 lt/cgt
  • ltcgt 2 lt/cgt
  • lt/bgt
  • ltagt ltcgt 3 lt/cgt
  • lt/agt
  • lt/egt
  • ltcgt 4 lt/cgt
  • lt/agt
  • ltAgt ltBgt ltCgt 1 lt/Cgt
  • ltCgt 2 lt/Cgt
  • lt/Bgt
  • ltAgt ltCgt 3 lt/Cgt
  • lt/Agt
  • ltCgt 4 lt/Cgt
  • lt/Agt

65
XSL is Structural Recursion
  • Equivalent to

f(T1 U T2) f(T1) U f(T2) f(L T) if L
c then C t else L b then
B f(t) else L a then A
f(t) else f(t) f()
f(V) V
XSL query single function XSL query with modes
multiple function
66
XSL and Structural Recursion
  • XSL
  • trees only
  • may loop
  • Structural Recursion
  • arbitrary graphs
  • always terminates

add the following rule
ltxsltemplate match egt
ltxslapply-patterns select//gt lt/xsltemplategt
stack overflow on IE 5.0
67
Summary of Query Languages
  • studied extensively in semistructured data
  • some quite powerful features
  • no standard for XML QL yet (WG soon)
  • XSL available today (for stylesheets)
  • XSL structural recursion

68
Part 3Schemas
  • Semistructured data and XML
  • Query languages
  • Schemas
  • Systems issues
  • Conclusions

69
Schemas
here lies our interest
  • why ?
  • XML to describe semantics
  • semistructured data to improve processing
  • what ?
  • semistructured data foundational
  • XML several concrete proposals

70
Schemas
  • when ?
  • semistructured data, XML a posteriori
  • RDBMS a priori, to interpret binary data
  • how ?
  • semistructured data schema is independent
  • XML schema is hardwired with the data

71
Outline
  • schemas for semistructured data
  • foundations
  • schema extraction
  • schemas for XML
  • DTD
  • XML-Schema
  • RDF-Schema

72
Schemas An Example
Some database
73
Lower-Bound Schemas
Root
person
company
works-for
managed-by
Employee
Company
c.e.o.
name
address
name
string
74
Upper Bound Schemas
Root
person
company
works-for
managed-by
Employee
Company
c.e.o. employee
name address url
name phone position
description
string
Any
-
75
The Two Questions to Ask
  • Conformance does that data conform to this
    schema ?
  • Classification if so, then which objects belong
    to what classes ?

76
Graph Simulation
  • Definition Two edge-labeled graphs G1, G2
  • A simulation is a relation R between nodes
  • if (x1, x2) in R, and (x1,a,y1) in G1,
  • then exists (x2,a,y2) in G2 (same label)
  • s.t. (y1,y2) in R

Note a simulation can be efficiently computed
Henzinger, et a. 1995
77
Using Simulation
  • Data graph D, schema S
  • upper bound schema
  • conformance find simulation R from D to S
  • classification check if (x,c) in R
  • lower bound schema
  • conformance find simulation R from S to D
  • classification check if (c,x) in R

Buneman et al 1997
78
Example
Database
Lower Bound
Upper Bound
simulation efficient technique for checking
conformance to schema
79
Application 1 Improve Secondary Storage
Lower-bound schema
Store rest in overflow graph
80
Application 2 Query Optimization
select X.title from Bib._ X where X..zip
12345
select X.title from Bib.book X where
X.address.zip 12345
Upper-bound schema
Fernandez, Suciu 1998
81
Schema Extraction(From Data)
  • Problem statement
  • given data instance D
  • find the most specific schema S for D
  • In practice S too large, need to relax

Nestorov et al. 1998
82
Schema Extraction Sample Data
r
employee
employee
employee
employee
employee
employee
employee
employee
manages
manages
manages
manages
manages
p8
p1
p2
p3
p4
p5
p6
p7
managedby
managedby
managedby
managedby
managedby
worksfor
worksfor
worksfor
worksfor
worksfor
company
worksfor
worksfor
worksfor
c
83
Lower Bound Schema Extraction
Root r
employee
company
employee
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
84
Upper Bound Schema Extraction Data Guides
Root r
employee
Employees p1,p1,p3,P4 p5,p6,p7,p8
company
manages
managedby
worksfor
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
85
Schemas in XML
  • Document Type Definition (DTD)
  • XML Schema
  • RDF Schema

86
Document Type Definition DTD
  • part of the original XML specification
  • an XML document may have a DTD
  • terminology for XML
  • well-formed if tags are correctly closed
  • valid if it has a DTD and conforms to it
  • validation is useful in data exchange

87
DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
88
DTDs as Schemas
  • Not so well suited
  • impose unwanted constraints on order
    lt!ELEMENT person (name,phone)gt
  • references cannot be constrained
  • can be too vague
  • lt!ELEMENT person ((namephoneemail))gt

89
XML Schemas
  • very recent proposal
  • unifies previous schema proposals
  • generalizes DTDs
  • uses XML syntax
  • two documents structure and datatypes
  • http//www.w3.org/TR/xmlschema-1
  • http//www.w3.org/TR/xmlschema-2

90
XML Schemas
  • ltelementType namepapergt
  • ltsequencegt
  • ltelementTypeRef nametitle/gt
  • ltelementTypeRef nameauthor
    minOccurs0/gt
  • ltelementTypeRef nameyear/gt
  • ltchoicegt ltelementTypeRef
    namejournal/gt
  • ltelementTypeRef
    nameconference/gt
  • lt/choicegt
  • lt/sequencegt
  • lt/elementTypegt

DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
91
RDF Schemas
  • http//www.w3.org/TR/PR-rdf-schema (3/99)
  • object-oriented flavor

92
RDF Schemas
  • recall RDF data
  • resources
  • properties
  • RDF schema
  • classes
  • properties

statement
93
RDF Schemas
  • Data
  • ltrdfDescription IDcar001gt
  • ltnamegt My Honda lt/namegt
  • ltmilesgt 50000 lt/milesgt
  • ltrdftype resourceMotorVehicle/gt
  • lt/rdfDescriptiongt

94
RDF Schemas
  • Schema
  • ltrdfDescription IDMotorVehiclegt
  • ltrdftype resourceClass/gt
  • ltrdfsubClassOf resourceResource/gt
  • lt/rdfDescriptiongt
  • ltrdfDescription IDTruckgt
  • ltrdftype resourceClass/gt
  • ltrdfsubClassOf resourceMotorVehicle/gt
  • lt/rdfDescriptiongt

95
RDF Schemas
car001
miles
name
type
Truck
My Honda
50000
subClassOf
MotorVehicle
type
type
Class
96
RDF Schemas
  • different from object-oriented systems
  • OO define a class by set of properties
  • RDF define a property in terms of its classes
  • metadata in RDF
  • an RDF schema described as an RDF data

97
Summary of Schemas
  • in SS data
  • graph theoretic
  • data and schema are decoupled
  • used in data processing
  • in XML
  • from grammar to object-oriented
  • schema wired with the data
  • emphasis on semantics for exchange

98
Part 4Systems Issues
  • Semistructured data and XML
  • Query languages
  • Schemas
  • Systems issues
  • Conclusions

99
Systems Issues
  • servers
  • mediators

100
Servers for Semistructured Data / XML
  • storage
  • index
  • query evaluation McHugh, Widom 1999

101
XML Storage
  • text file (XML)
  • store in ternary relation
  • use DTD to derive schema
  • mine data to derive schema
  • build special purpose repository (Lore)

102
XML Storage Text File
  • advantages
  • simple
  • less space than one thinks
  • reasonable clustering
  • disadvantage
  • no updates
  • require special purpose query processor

103
Store XML in Ternary Relation
o1
paper
o2
year
title
author
author
o3
o4
o5
o6
The Calculus


1986
  • Florescu, Kossman 1999

104
Use DTD to derive Schema
  • DTD
  • ODMG classes
  • Christophides et al. 1994 , Shanmugasundaram et
    al. 1999

lt!ELEMENT employee (name, address,
project)gt lt!ELEMENT address (street, city,
state, zip)gt
class Employee public type tuple (namestring,
addressAddress, projectList(Project)) class
Address public type tuple (streetstring, )
105
Mine Data to Derive Schema
Deutsch et al. 1999
106
Indexing Semistructured Data
  • coercions 1995 v.s. 1995
  • regular path expressions
  • data guides Goldman, Widom, 1997
  • T-indexes Milo, Suciu, 1999

107
Indexing All Paths in the Data
108
Mediators for Semistructured Data / XML
  • XML virtual view of Relational/OO/OR sources
  • mediator translation, integration
  • issues
  • query composition and rewriting Papakonstatinou,
    et al. 1996
  • limited source capabilities Yerneni, et al.
    1999

109
Example An XML Mediator
  • relational database
  • virtual XML view

ltstoregt ltnamegt n1 lt/namegt ltbookgt
... lt/bookgt ltbookgt ... lt/bookgt
... lt/storegt ltstoregt ltnamegtn2
lt/namegt ltbookgt ... lt/bookgt
ltbookgt ... lt/bookgt lt/storegt
110
Example An XML Mediator
  • specify mediator declaratively (a view)

from Store, SB, Book where
Store.sidSB.sid and
SB.bidBook.bid construct ltstore
IDf(Store.sid)gt ltnamegt
Store.name lt/namegt ltbookgt
Book.title lt/bookgt lt/storegt
111
Example An XML Mediator
  • users ask XML-QL queries
  • find stores who sell The Calculus

where ltstoregt ltnamegt n lt/namegt
ltbookgt The Calculus lt/bookgt
ltstoregt construct ltresultgt n
lt/resultgt
112
Example An XML Mediator
  • system composes query with view

from Store, SB, Book where Store.sidSB.sid
and SB.bidBook.bid and
Book.titleThe Calculus construct ltresultgt
Store.name lt/resultgt
113
Summary of Systems
  • unclear today how XML will be used
  • materialized ? Need servers
  • virtual ? Need mediators
  • most work is still ahead

114
Part 5Conclusions
  • Semistructured data and XML
  • Query languages
  • Schemas
  • Systems issues
  • Conclusions

115
Summary
  • XML what is out there
  • semistructured data what we can process
  • paradigm shift, for both Web and db
  • covered in tutorial
  • data models, queries, schemas

116
Current and Future Technologies
  • Web applications possible today
  • export relational data to XML (e.g. Oracle)
  • import XML directly into applications
  • Web applications in the future
  • mediator technology (XML view)
  • store/process native XML data
  • compress XML
  • mine/analyze XML

117
Why This Is Cool for Database Researchers
  • put to work what you teach in CS101 !
  • tree traversals (structural recursion, XSL)
  • automata theory (DTDs, path expressions)
  • graph theory (simulation)
  • adapt old DB tricks to new kind of data
  • save the trees from fax to XML

The End
118
Further Readings
  • www. w3.org/XML
  • www-db.stanford.edu/widom
  • www-rocq.inria.fr/abiteboul
  • db.cis.upenn.edu
  • www.research.att.com/suciu
  • Abiteboul, Buneman, Suciu
  • Data on the Web From Relational to
    Semistructured to XML
  • Morgan Kaufmann, 1999 (appears in October)
Write a Comment
User Comments (0)
About PowerShow.com