XML Transformation Language Based on Monadic Second Order Logic

About This Presentation
Title:

XML Transformation Language Based on Monadic Second Order Logic

Description:

First-order logic extended with 'monadic second-order variables' ... Input: XHTML. Essentially, a list of headings: h1 , h2 , h3 , ... Output. Tree structure ... –

Number of Views:52
Avg rating:3.0/5.0
Slides: 43
Provided by: kazuhir8
Category:

less

Transcript and Presenter's Notes

Title: XML Transformation Language Based on Monadic Second Order Logic


1
XML Transformation Language Based onMonadic
Second Order Logic
  • Kazuhiro Inaba
  • Haruo Hosoya
  • University of Tokyo
  • PLAN-X 2007

2
Monadic Second-order Logic(MSO)
  • First-order logic extended with monadic
    second-order variables ranging over sets of
    elements

?A.(A?f ? ?x. (x in A ?y.(y in A ? x?y)))
e.g.
Set Operations
Variables Denoting Sets
3
Monadic Second-order Logic(MSO)
  • As a foundation of XML processing
  • XML Query languages provably MSO-equivalent in
    expressiveness (Neven 2002, Koch 2003)
  • Theoretical models of XML Transformation with MSO
    as a sub-language for node selection (Maneth
    1999, 2005)

4
Monadic Second-order Logic(MSO)
  • Although used in theoretical researches
  • No actual language system exploiting MSO formulae
    themselves for querying XML
  • Why?
  • Little investigation on advantages of using MSO
    as a construct for XML programming
  • High time complexity for processing MSO
    (hyper-exponential in the worst-case), which
    makes practical implementation hard

5
What We Did
  • Bring MSO into a practical language system for
    XML processing!
  • Show the advantages of using MSO formulae as a
    query language for XML
  • Design an MSO-based template language for XML
    transformation
  • Establish an efficient implementation strategy of
    MSO

MTran http//arbre.is.s.u-tokyo.ac.jp/kinaba/MT
ran/
6
Outline
  • Why MSO Queries?
  • MSO-Based Transformation Language
  • Efficient Strategy for Processing MSO

7
Why MSO Queries?
8
MSOs Advantages
  • No explicit recursions needed for deep matching
  • Dont-care semantics to avoid mentioning
    irrelevant nodes
  • N-ary queries are naturally expressible
  • All regular queries are definable

MSO XPath RegExpPatterns (XDuce) MonadicDatalog
NoRecursion ? ?
Dont-care ? ? ?
N-ary ? ?
Regularity ? ? ?
9
Why MSO?(1) No Explicit Recursion
  • MSO does not require recursive definition for
    reaching nodes in arbitrary depth.
  • Select all ltimggt elements in the input XML

x in ltimggt
MSO XPath RegExpPatterns Monadic Datalog
NoRecursion ? ?
10
Why MSO?(2) Dont-care Semantics
  • No need to mention irrelevant nodes in the query
  • MSO
  • Regular Expression Patterns
  • Requires specification for whole tree structures

ex1 y. x/y y in ltdategt
x as Any, dateAny, Any
MSO XPath RegExpPatterns Monadic Datalog
Dont-care ? ? ?
11
Why MSO? (3) N-ary Queries
  • Formulae with N free variables define N-ary
    queries
  • MSO
  • XPath
  • Limited to 1-ary (absolute path) and 2-ary
    (relative path) queries

ex1 p. (p/xltfoogt p/yltbargt p/zltbuzgt)
MSO XPath RegExpPatterns Monadic Datalog
N-ary ? ?
12
Why MSO?(4) Regularity
  • MSO can express any regular queries.
  • i.e. the class of all queries that are
    representable by finite state tree automata

Lack of regularity is not just a sign of
theoretical weakness, but has a practical impact
MSO XPath RegExpPatterns Monadic Datalog
Regularity ? ? ?
13
ExampleGenerating a Table of Contents
  • Input XHTML
  • Essentially, a list of headingslth1gt, lth2gt,
    lth3gt,
  • Output
  • Tree structure

ltulgt ltligt h1 ltulgt ltligt h2 lt/ligt ltligt h2
ltulgt ltligt h3 lt/ligt lt/ulgtlt/ligt
lt/ulgtlt/ligt ltligt h1 ltulgt ltligt h2 lt/ligt
lt/ulgtlt/ligt lt/ulgt
lthtmlgtltbodygt lth1gt ltpgt lth2gt ltpgt ltpgt lth2gt ltpgt
lth3gt lth1gt ltpgt lth2gt ltpgt ltpgt ltpgt lth3gt lth1gt
ltpgt ltpgt ltpgt ltpgt lt/bodygtlt/htmlgt
14
ExampleGenerating a Table of Contents
  • Queries required in this transformation
  • Gather all lth1gt elements
  • For each lth1gt element x,
  • Gather all subheading of x, that is,
  • All lth2gt elements y that
  • Appears after x, and
  • No other lth1gts appearbetween x and y
  • For each lth2gt,

ltbodygt lth1gt lth2gt lth3gt lth1gt lth2gt lth3gt
lth2gt lth1gt lth2gt lt/bodygt
15
ExampleGenerating a Table of Contents
  • Straightforward in MSO
  • lth2gt element y that
  • Appears after x, and
  • No other lth1gts appear between x and y.

y in lth2gt x lt y all1 z.(z in lth1gt gt (xltz
zlty))
Each condition is expressible in, e.g., XPath
1.0, but combining them is difficult. (Due to
the lack of universal quantification.)
16
ExampleLPathBird et al., 2005 Linguistic
Queries
  • A linguistic query requiring immediatelyfollowi
    ng relation
  • Input
  • Parse tree of a statement in a natural language
  • Query
  • Select all elements y that follow after x in
    some proper analysis

17
ExampleLPathBird et al., 2005 Linguistic
Queries
  • Proper analysis
  • A set P of elements such that
  • Every leaf node in the tree has exactly one
    ancestor contained in P

S
VP
NP
PP
NP
NP
N today
N dog
Det a
Prep with
N man
Adj old
Det the
N I
V saw
18
ExampleLPathBird et al., 2005 Linguistic
Queries
  • Straightforward in MSO
  • Every leaf node in the tree has exactly one
    ancestor contained in P.

pred is_leaf(var1 x) ex1 y.(x/y) pred
proper_analysis(var2 P) all1 x.(is_leaf(x)
gt ex1 p.(p//x p in P all1 q.(q//x
q in P gt pq)))
19
ExampleLPathBird et al., 2005 Linguistic
Queries
  • Immediately follows query in MSO
  • Select all elements y that follows after x in
    some proper analysis

pred follow_in(var2 P, var1 x, var1 y) x in P
y in P ex1 z. (z in P xltz zlty) ex2
P. (proper_analysis(P) follow_in(P,x,y))
Second-order variable!
20
MTran MSO-Based Transformation Language
21
MTran Overview
  • Select and transform style templates (similar
    to XSLT)
  • Select nodes with MSO queries
  • Apply templates to each selected node
  • Question
  • What is a design principle for templates that
    fully exploits the power of MSO?
  • Simply adopting XSLT templates is not our answer

22
MTran Overview
  • MSO does not require explicit recursion
  • Natural design transformation also does not
    require explicit recursion
  • MSO enables us to write N-ary queries
  • Select a target node depending on N-1 previously
    selected nodes
  • XSLT uses XPath (binary queries) where the
    selection depends only on a single context node

23
1. No-recursion in Templates
  • Visit template
  • Locally transform each node that matched f(x)
  • Reconstruct whole tree, preserving unmatched part


visit x
F(x)
Subtemplate
24
No-recursion in Templates
  • E.g. wrap every ltTragetgt element by a ltMarkgt tag


visit x
x in ltTargetgt
Markx
ltRootgt ltTarget/gt ltTargetgt
ltNgtltTarget/gtlt/Ngt lt/Targetgt lt/Rootgt
ltRootgt ltMarkgtltTarget/gtlt/Markgt ltMarkgtltTargetgt
ltNgtltMarkgtltTarget/gtlt/Markgtlt/Ngt
lt/Targetgtlt/Markgt lt/Rootgt
25
1. No-recursion in Templates
  • Gather drops all unmatched part, and matched
    part are listed.

gather x x in ltTargetgt Markx
26
2. Nested Templates
  • Nested query can refer outer variables

visit x x in lttextBoxgt visit y from x
textnode(y) span _at_stylegather
zex1 p.(x/p/y p/_at_style/z)z y
y in ltspangt
ltDocumentgt lttextBoxgt ltspan styleboldgt
ltspan styleredgt Hi! lt/spangt lt/spangt
lt/textBoxgt lt/Documentgt
ltDocumentgt lttextBoxgt ltspan styleboldredgtH
i!lt/spangt lt/textBoxgt lt/Documentgt
27
Efficient Strategy forProcessing MSO
28
MSO Evaluation
  • We follow the usual 2-step strategy
  • Compile a formula to a tree automaton
  • Run queries using the automaton

29
Our Approach
  • Compilation
  • Exploit MONAKlarlund et al.,1999 system
  • Our contribution experimental results in the
    context of XML processing
  • Querying by Tree Automata
  • Similar to Flum-Frick-Grohe 01 algorithm
  • O( input output )
  • Our contribution simpler implementation via
    partially lazy evaluation of set operations.

30
Defining Queriesby Tree Automata
  • An automaton runs on trees with alphabet S0,1N
    defines an N-ary query overtrees with alphabet S
  • A (S0,1N, Q, d, q0, F)
  • S0,1N alphabet
  • Q the set of states
  • d QQS 0,1N ? Q
  • q0 initial state
  • F accepting states

31
Defining Queriesby Tree Automata
  • A pair (p,q) in tree T is an answer for
    the binary query defined by an automaton A
  • ? The automaton A accepts a marked tree T,
    (augmentation of T with 1 at p and q)

X
X00
T
T
p
Y
Z
Y10
Z00
q
W
V
W00
V01
32
Algorithms for Queries in Tree Automata
  • Naïve algorithm
  • For each tuple, generate a corresponding marked
    tree, and run the automaton
  • O( inputN1 )

33
Algorithms for Queries in Tree Automata
  • Naïve algorithm usings sets
  • For each node p and state q, calculate mp(q)
  • The set of tuples of nodes such that if theyre
    marked, the automaton reaches the state q at the
    node p
  • ?mroot(q) q in F is the answer
  • mp(q) is calculated in bottom-up manner

mp( q ) ? ml( q1 )pmr( q2 ) d(q1, q2,
Y1)q ? ? ml( q1) mr( q2 ) d(q1,
q2, Y0)q
p
Y
ml
mr
W
V
34
Flum-Frick-Grohe Algorithm
  • Redundancies in naïve set algorithm
  • Calculation of sets that do not contribute to the
    final result (mroot(q) for q in F)
  • Calculation on unreachable states
  • States that cannot be reached for any marking
    patterns
  • Flum-Frick-Grohe algorithm avoids these
    redundancies by 3-pass algorithm
  • Detects two redundancies in 2-pass
    precalculations
  • Runs the set algorithm, avoiding those
    redundancies using results from the first 2-passes

35
Our Approach
  • Eliminate the redundancies bysimply implementing
    naive set algorithm by Partially Lazy
    Evaluation of Set Operations
  • Delays set operations (i.e., product and union)
    until it is really required
  • except the operations over empty sets

type a set EmptySet
NonEmptySet of a neset type a neset Singleton
of a Union of a neset a
neset Product of a neset a
neset
36
Our Approach
  • 2-pass algorithm
  • Run set algorithm using the partially lazy
    operations
  • Actually evaluate the lazy set
  • Easier implementation
  • Implementation of partially lazy set operations
    is straightforward
  • Direct implementation of set algorithm is also
    straightforward (compared to the one containing
    explicit avoidance of redundancies)

37
Experimental Results
  • Experiments on 4 examples
  • Compilation Time (in seconds)
  • Execution Time for 3 different sizes of documents

Compile 10KB 100KB 1MB
ToC 0.970 0.038 0.320 3.798
LPath 0.655 0.063 0.429 4.050
MathML 0.703 0.236 1.574 16.512
RelaxNG 0.553 0.068 0.540 5.684
On 1.6GHz AMD Turion Processor, 1GB RAM, (sec).
Units are in seconds.
38
Related Work
39
Related Work (MSO-based Transformation)
  • DTL Maneth and Neven 1999
  • TL Maneth, Perst, Berlea, and Seidl 2005
  • Adopt MSO as the query language.
  • Aim at finding theoretical properties for
    transformation models (such as type checking)
  • MTran aims to be a practical system.
  • Investigation on the design of transformation
    templates and the efficient implementation

40
Related Work (MSO Query Evaluation)
  • Query Evaluation via Tree-Decompositions Flum,
    Frick, and Grohe 2001
  • Basis of our algorithm
  • Our contribution is partially lazy operations on
    sets, which allows a simpler implementation
  • Several other researches in this area Neven and
    Bussche 98 Berlea and Seidl 02 Koch 03
    Niehren, Planque, Talbot and Tison 05
  • Only restricted cases of MSO treated, or have
    higher complexity

41
Future Work
  • Exact Static Type Checking
  • Label Equality
  • The labels of x and y are equal is not
    expressible in MSO
  • But is useful in context of XML processing (e.g.,
    comparison between _at_id and _at_idref attribute)
  • Can we extend MSO allowing such formulae, yet
    while maintaining the efficiency?

42
Thank you for listening!
  • Implementation available online
  • http//arbre.is.s.u-tokyo.ac.jp/kinaba/MTran/

43
Appendix MSO Formulae
  • Primitives
  • Logical Connectives
  • Quantifiers
  • Useful Syntax Sugars

firstChild(a,b) nextSibling(a,b) ab a in A
? F ? F ? gt f ?
all1 a. ?(a) all2 A. ?(A) ex1 a. ?(a)
ex2 A. ?(A)
a/b a is the parent of b a//b a is
an ancestor of b altb a comes before b in
document order
44
Appendix Template
  • Transformations
  • Static Contents

visit VAR ( MSO TEMPLATE) gather VAR
( MSO TEMPLATE)
text elem _at_attr
45
XSLT version ofvisit x x in ltTargetgt
Markx
ltxslstylesheet ...gt ltxsltemplate
matchTarget"gt ltMarkgtltTargetgt
ltxslapply-templates/gt lt/Targetgtlt/Markgt
lt/xsltemplategt ltxsltemplate
match_at_node()"gt ltxslcopygtltxslapply-templa
tes/gtlt/xslcopygt lt/xsltemplategt lt/xslstyleshee
tgt
46
Exact Type Checking?
  • Given an input/output schema and a transformation
    template, check their conformance (without any
    approximations)
  • Exact type checking over (macro-) tree
    transducers are a hot area
  • MTran
  • Query is already in tree automata
  • Transformation part seems to be related to tree
    transducers

47
MSO Tree Transducers
  • Transformation also in MSO formulae
  • Fv,n(x) true if the nth copy of input node x
    is a node in output
  • Fe,n,m(x,y) true if the nth copy of x and the
    mth copy of y is connected in
    output
  • Transformation in linear size increase only
  • Quadratic size increase is possible in MTran
  • Whether MTran can express all MSO-TT
    transformations or not, is not clear yet

48
Semantics of visit
  • Transform locally
  • All fragments are combined by a top-down
    recursive traversal of edges
  • One fragment is selected only once per path
  • visit x x in ltTargetgt Markx
Write a Comment
User Comments (0)
About PowerShow.com