Title: Managing XML and Semistructured Data
1Managing XML and Semistructured Data
- Lecture 9 Query Languages -
- StruQL and XSL
Prof. Dan Suciu
Spring 2001
2In this lecture
- Website management with Strudel
- Background on skolem functions
- Skolem functions in StruQL
- Structural recursion
- XSL
- Resources
- Catching the boat with Strudel VLDBJ 2001
- UnQL A Query Language and Algebra for
Semistructured Data Based on Structural Recursion
Buneman, Fernandez, Suciu.VLDBJ 2000 - Data on the Web Abiteboul, Buneman, Suciu
sections 5.2, 6.4, 6.5
3Strudel and StruQL
- Strudel a Website management tool
- Idea separate the following three tasks
- Management of data
- use some database
- Management of the sites structure
- use StruQL
- Management of the sites presentation
- use HTML templates (this was before XML...)
4Example Bibliography Data
Input data
Bib
- Bib paper author Jones,
- author Smith,
- title The Comma,
- year 1994 ,
- paper author Jones,
- title The Dot,
- year 1998 ,
- paper author Mark,
- ....
- . . .
-
paper
paper
paper
author
year
author
title
Jones
Smith
The Comma
.....
5Simple Website Definition in StruQL
WHERE Root -gt Bib.paper.author -gt A CREATE
Root(), HomePage(A) LINK Root() -gt
person -gt HomePage(A),
HomePage(A) -gt name -gt A
HomePage(A) -gt home -gt Root()
StruQL query
Result
home
home
home
name
name
name
Smith
Jones
Mark
Root(), HomePage(A) Skolem Functions (more
later)
6Complex Website Definition in StruQL
WHERE Root -gt Bib -gt X, X -gt paper -gt P,
P -gt author -gt A, P -gt title -gt T,
P -gt year -gt Y CREATE Root(), HomePage(A),
YearPage(A,Y), PubPage(P) LINK Root() -gt
person -gt HomePage(A), HomePage(A)
-gt yearentry -gt YearPage(A,Y),
YearPage(A,Y) -gt publication -gt PubPage(P),
PubPage(P) -gt author -gt HomePage(A),
PubPage(P) -gt title -gt T
7Example A Complex Web Site
The Comma
The Dot
8Skolem Functions
- Maier, 1986
- in OO systems
- Kifer et al, 1989
- F-logic
- Hull and Yoshikawa, 1990
- deductive db (ILOG)
- Papakonstantinou et al., 1996
- semistructured db (MSL)
9Skolem Functions in Logic
- Origins First Order Logic
- The Satisfiability problemgiven a formula ?,
does it have a model ?
10Skolem Functions in Logic
- Example does ? have a model ?
- Skolem functions replace ? with functions, drop
? - Fact ? has a model iff ? has a model
11Skolem Functions in Databases
Answer(title, author) - Paper(author, title,
year)
12Skolem Functions in Databases
- Now consider
- I want to create a new object x. What meaning ?
Answer(author, x) - Paper(author, title, year)
13Skolem Functions in Databases
- Better use Skolem functions directly in Datalog
- Choices
Answer(author, NewObj(author)) - Paper(author,
title, year)
Answer(author, NewObj(author,title)) -
Paper(author, title, year)
Answer(author, NewObj(title,year)) -
Paper(author, title, year)
Answer(author, NewObj()) - Paper(author, title,
year)
14Skolem Functions in StruQL
- StruQLs semantics
- Input graph (Node, Edge)
- Output graph(Node, Edge)
- Example
WHERE Root -gt Bib.paper.author -gt A CREATE
Root(), HomePage(A) LINK Root() -gt
person -gt HomePage(A),
HomePage(A) -gt name -gt A
HomePage(A) -gt home -gt Root()
Node(Root()) - Node(HomePage(A)) -
Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y
,author,A)Edge(Root,person,HomePage(A)) -
Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
Edge(HomePage(A),person, A) -
Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
Edge(HomePage(A),home,Root()) -
Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
15A Different ParadigmStructural Recursion
- Data as sets with a union operator
- a3, abone, c5, b4
- a3 U abone,c5 U b4
16Structural Recursion
- Example retrieve all integers in the data
f(T1 U T2) f(T1) U f(T2) f(L T)
f(T) f() f(V)
if isInt(V) then result V
else
17Structural Recursion
f(T1 U T2) f(T1) U f(T2) f(L T)
if La then bf(T) else Lf(T) f()
f(V) V
Returns the same tree with a-edges replaced by
b-edges
18Structural Recursion
f(T1 U T2) f(T1) U f(T2) f(L T)
LLf(T) f()
f(V) V
Input tree with n nodes Output tree with 2n
nodes (every edge is doubled)
19Structural Recursion
- Example increase all engine prices by 10
20Structural Recursion
- Retrieve all subtrees reachable by (a.b).a
a
b
a
21Structural Recursion General Form
f1(T1 U T2) f1(T1) U f1(T2) f1(L T)
E1(L, f1(T),...,fk(T), T) f1()
f1(V)
. . . .
fk(T1 U T2) fk(T1) U fk(T2) fk(L T)
Ek(L, f1(T),...,fk(T), T) fk()
fk(V)
Each of E1, ..., Ek consists only of _ _, U,
if_then_else_
22Evaluating Structural Recursion
- Recursive Evaluation
- Compute the functions recursively, starting with
f1 at the root - Termination is guaranteed.
- How efficiently can we evaluate this ?
23Structural Recursion
f(T1 U T2) f(T1) U f(T2) f(L T)
Lf(T), Lf(T) f()
f(V) V
24Naive Recursive Evaluation
a
a
a
b
b
b
b
b
c
c
c
c
c
c
c
c
c
d
Input tree n nodes Output tree 2n1 1 nodes
25Efficient Recursive Evaluation
Recursive Evaluation with function
memorization. PTIME complexity.
f(T1 U T2) f(T1) U f(T2) f(L T)
Lf(T), Lf(T) f()
f(V) V
Alternatively apply the function in parallel to
each input edge ? Bulk Evaluation
26Bulk Evaluation
Sometimes f doesnt return anything ? use ? edges
f(T1 U T2) f(T1) U f(T2) f(L T)
if Lc then T else f(T) f()
f(V) V
27Epsilon Edges
a
b
a
b
?
d
c
c
d
c
d
28Epsilon Edges
- Note union becomes easy to draw with ? edges
- Example
?
?
T1
T2
U
T1
T2
?
?
a
b
U
a
b
c
d
e
c
d
e
e
a
c
d
b
29Bulk Evaluation
- Idea apply E1, ..., Ek independently on each
edge, then connect with ? edges ? PTIME
30Bulk Evaluation
Recall (a.b).a
a
b
b
a
a
a
a
a
b
d
a
b
b
a
a
c
b
d
a
a
b
d
d
c
b
b
c
c
31Structural Recursion
- Can evaluate in two ways
- Recursively memorize functions results
- Bulk apply all functions on all edges, in
parallel, connect, eliminate what is useless - Complexity PTIME
- More precisely NLOGSPACE
- Works on graphs with cycles too !
32XSL
- XSLT 1.0 (a recommendation)
- http//www.w3.org/TR/xslt.html
- XSLT 1.1 (a working draft)
- http//www.w3.org/TR/xslt11/
- In commercial products (e.g. IE5.0)
33XSL
- Purpose stylesheet specification language
- stylesheet XML -gt HTML
- in general XML -gt XML
- Uses XPath
34XSL Program
- XSL program template-rule ... template-rule
- template-rule match pattern template
Example Retrieve all book titles
ltxsltemplate match /gt
ltxslapply-templates/gt lt/xsltemplategt ltxsltemp
late match /bib//titlegt ltresultgt
ltxslvalue-of select . /gt lt/resultgt lt/xsltem
plategt
35Simple XSL Program
ltxsltemplate match /gt
ltxslapply-templates/gt lt/xsltemplategt ltxsltemp
late match text()gt ltxslvalue-of
select./gtlt/xsltemplategt ltxsltemplate match
gt ltxslelement namename(.)gt
ltxslapply-templates/gt
lt/xslelementgt lt/xsltemplategt
36Flow Control in XSL
ltxsltemplate match /gt ltxslapply-template
s/gt lt/xsltemplategt ltxsltemplate matchagt
ltAgtltxslapply-templates/gtlt/Agt lt/xsltemplategt ltxs
ltemplate matchbgt ltBgtltxslapply-templates/gtlt/
Bgt lt/xsltemplategt ltxsltemplate matchcgt
ltCgtltxslvalue-of/gtlt/Cgt lt/xsltemplategt
37- ltagt ltegt ltbgt ltcgt 1 lt/cgt
- ltcgt 2 lt/cgt
- lt/bgt
- ltagt ltcgt 3 lt/cgt
- lt/agt
- lt/egt
- ltcgt 4 lt/cgt
- lt/agt
- ltAgt ltBgt ltCgt 1 lt/Cgt
- ltCgt 2 lt/Cgt
- lt/Bgt
- ltAgt ltCgt 3 lt/Cgt
- lt/Agt
- ltCgt 4 lt/Cgt
- lt/Agt
38XSL is Structural Recursion
f(T1 U T2) f(T1) U f(T2) f(L T) if L
c then C t else L b then
B f(t) else L a then A
f(t) else f(t) f()
f(V) V
? ltxsltemplate matchcgt
? ltxsltemplate matchbgt
? ltxsltemplate matchagt
? ltxsltemplate match /gt
XSL query single function XSL query with modes
multiple function (next)
39Modes in XSLT
Compute the path (a.b)
f(T1 U T2) f(T1) U f(T2) f(a T)
resultT U g(T) f() f(V)
V g(T1 U T2) g(T1) U g(T2) g(b T)
f(T) g() g(V)
V
ltxsltemplate match /gt
ltxslapply-templates modef/gt
lt/xsltemplategt ltxsltemplate match
modef/gt ltxsltemplate matcha modefgt
ltresultgt ltxslcopy-of match./gt lt/resultgt
ltxslapply-templates modeg/gt lt/xsltemplategt lt
xsltemplate match modeggt ltxsltemplate
matchb modeggt ltxslapply-templates
modef/gt lt/xsltemplategt
ltxslcopy-of ... gt copies the input to the output
ignoring modes, this computes (ab)
40Modes in XSLT
- Mode a name for a group of template rules
- No mode empty mode
- Same as having multiple recursive functions
41Conflict Resolutionfor Template Rules
- If several template rules match, choose that with
highest priority. - Explicit priority ltxsltemplate matchabc
priority1.41gt - Computing implicit priority ad-hoc rules given
by the W3C, based on match - matchP1 P2 ... ? transform to a set of
template rules. - matchabc ? the priority is 0.
- match... some namespace name... ? the
priority is -0.25. - matchnode() ? the priority is -0.5.
- Otherwise, the priority is 0.5
It is an error if this leaves more than one
matching template rule.
42Built-in Template Rules
- Keeps us goingltxsltemplate match /gt
ltxslapply-templates/gtlt/xsltemplategtthere is
one such rule for each mode - Copies what we forgotltxsltemplate match
text()_at_gtltxslvalue-of select./gtlt/xsltempl
ategtthere is only one rule, for the empty mode - Lowest priorities among all rules hence, can be
easily overridden
43XSL Template
- ltxsltemplate match expression mode name
priority number name name gt - Body
- lt/xsltemplategt
- Default mode priority (computed as
explained earlier) name when no match, no mode - Body
- XML constructors ltmyTaggt...lt/myTaggt ltbgt
... lt/bgt ... - XSL instructions
- ltxslapply-templatesgt ( recursive call)
- ltxslvalue-ofgt ( copy the value)
- ltxslcopygt ( shallow copy)
- ltxslcopy-ofgt (deep copy)
- ltxslelementgt ( more flexible than XML
constructors) - ltxslattributegt ( add an attribute to the
element) - ltxslifgt ( conditional)
- ltxslfor-eachgt
- Instructions for variables
44XSL Apply Templates
- ltxslapply-templates select expression mode
name gt - Body
- lt/xsl apply-templatesgt
- Default
- select (children)
- mode (empty mode)
- Body
- Sort instructions
- Paramemter instructions
45XSL Variables
- Declaring a variable
- ltxslvariable name vname select valuegt
value lt/xslvariablegt - Value either in select, or in body
- Either in ltxsltemplategt ... lt/xsltemplategt or
at top level - Declaring a parameter
- ltxslparam select valuegt value lt/xslparamgt
- In ltxsltemplategt ... lt/xsltemplategt, at the
beginning - Passing a paramemter
- ltxslwith-param select valuegt value
lt/xslparamgt - In ltxslapply-templatesgt ... lt/xslapply-templates
gt - Using variables vname
46XSL and Structural Recursion
- XSL
- mainly on trees
- may loop
- Structural Recursion
- arbitrary graphs
- always terminates
add the following rule
ltxsltemplate match egt
ltxslapply-patterns select//gt lt/xsltemplategt
stack overflow on IE 5.0