Title: Spatial tree logics to reason about Semistructured Data
1Spatial tree logics to reason about
Semistructured Data
SEBD 2003
- Speaker Giovanni Conforti
- Joint work with Giorgio Ghelli
Dipartimento di Informatica Università di Pisa
2What Im going to talk about
- A gentle introduction to Spatial Tree Logics
(STL) - STL and Semistructured Data (SSD)
- Properties of SSD (Constraints, Types, Queries) ?
Spatial Tree Logic (STL) Formulas - Decision Problems for SSD ? Validity/Satisfiabilit
y of STL Formulas - Presentation of a decidable fragment of the TQL
logic
3Background Spatial Logics
- Modal Logics to describe properties of structured
worlds - Many Applications Ambient Calculus, ?-calculus,
tree structured data, shared data structures, - Spatial (and temporal) modal operators to
describe structure (and behavior) - Equivalence, model checking and validity problem
are already studied for many spatial logics - Many works involving Cardelli, Gordon, Caires,
Ghelli, Gardner,
4A Simple Ground Spatial Tree Logic
- Worlds Information trees Unordered (multisets
of) labeled trees - F,F 0 (empty root)
- nF (an edge labelled n leading to the
i.t. F) - F F (the i.t. F next to the i.t F)
- Logic propositional logic connectives modal
operators describing the structure - A,B True Not A A and B
- 0 nA A B
5Examples
- F book
- titleDatabases0
- authorGhelli0
- authorAlbano0
-
A book authorGhelli0 B book
authorGhelli0 True C book Not
(editorTrue True) D book titleTrue And
authorTrue
6First order and modal recursion
- The full TQL logic extends the ground fragment
with - X tree variables
- xA locations with label variables
- Exists x. A quantification over labels (and
trees) - µ?. A fixpoint (? positive in A)
7Decision Problems
- Given a formula A and a model F
- Model checking F A ?
- Query Answering find values of x such that F
A(x) - Satisfiability sat(A) Exists a F such that F
A ? - Validity vld(A) is true that For each model F,
F A ? - Negation in the logic Sat(A) ?? Not vld(Not A)
- Implication?F. FA implies FB ?? vld(Not A Or
B) - With the simple ground STL all these problems are
decidable, but that is not true for
satisfiability/validity if we introduce variables
and quantification (or fixpoint)
8A SSD Data model labeled trees
articles article authorCardelli
authorGordon title Anywhere dateApr,
2000 article authorGhelli
titleTQL confETAPS date
monthFeb year2001
articles
article
article
title
date
author
author
date
author
Ghelli
year
Cardelli
Apr, 2000
month
Gordon
TQL
2001
Feb
9SSD Schema and Types
- Schema and Types to constraint the structure of
SSD - DTDs
- XML Schema
- Regular Expression Types
- A schema
- Article article titleString,authorString
,dateTrue? -
- A recurisve type
- Section section
- initString, Section, concString
-
10Types in STL
- Regular Type expressions and DTD can be expressed
(up to document order) in STL extended with modal
recursion - A schema
- article titleString,authorString,dateTrue
? -
- In STL
- article titleTrue
- (??. 0 Or authorTrue?)
- dateTrue or 0
-
11SSD Constraints
- Integrity Constraints on the values of SSD
- Inclusion Constraints
- Inverse Relationship Constraints
- Key Constraints
- path expressions to navigate on SSD
- articles.article.title(x)
- root.section.init(x)
- Integrity constraints as inclusion of paths
- student.takes gt course.cno
- student.takes ? course.taken_by
- Key constraints (first order logic with paths)
- ?x,y. article.title(x) And article.title(y)
And ?(xy) gt ?(x y)
12Constraints in STL
- Integrity Constraints over SSD are easily
expressed using STL with variables and
quantification. - Examples using path abbreviation (.aA aA
True) - An inclusion constraint
- ?X. .student.takingX gt .course.cnoX
- A key constraint for SSD
- ?X. Not (.article.titleX
.article.titleX ) - Combining quantification with recursion we can
express complex types and constraints (e.g.
binary trees)
13SSD Queries
- Many query languages (Xquery, Lorel, Yatl, ),
essentially queries are expressions selecting
data reachable from paths and constructing new
results - TQL a peculiar query language based on spatial
tree logic, the selection is done using pattern
matching over STL formulas - TQL logic expresses all regular path expressions
- Query answering is implemented for the full TQL
logic
14SSD Decision Problems with STL
- Given a data source F, and formulas A
representing a schema and B, B a set of
integrity constraints - Validation
- F A, FB, F A And B
- Schema/constraint consistency
- sat(A), sat(B), sat(A And B)
- Constraint Implication (inference)
- vld(B gt B)
- Constraint Implication in presence of a schema
- vld(A and B gt B)
15A decidable TQL sublogic
- STL are good to express types, constraints and
queries over SSD but - Validity in the full TQL logic is undecidable
- The gound logic is decidable, but it is not
enough to express all interesting types and
contraints - We are looking for a decidable fragment of TQL
expressive enough to reason about SSD - A first step in this direction is the following
logic
16A decidable TQL sublogic
- A, B True A and B Not A
- 0 A nA AB
- We can define useful operators to describe types
and constraints in this decidable logic - String def 0 Tree def
True - A or B def Not (Not A And Not B) A gt
B def Not A Or B - Aexists def A True
Aforeach def Not( Not A True) - AforeachTree def (Tree gt A) foreach
- Note if A gt Tree we can use AforeachTree to
express A
17Conclusions and Future Directions
- STL provide a powerful unified framework for
types, constraints, and queries over SSD and XML - This framework is worth of studying, it may lead
to - A good formalization of SSD reasoning in terms
of model checking and validity - Generalization of results on reasoning about
types, constraints - Query Optimization strategies guided by
types/constraints - (some) future steps
- Extend the decidable logic to express integrity
constraints - Modeling ordered trees
18Spatial tree logics to reason about Constraints
and Types
Università di Pisa Ph.D. Proposal
- Speaker Giovanni Conforti
- Supervisor Giorgio Ghelli
19SSD Query Optimization
- TQL pattern clause uses STL formulas
- We can use validated constraints C an types T as
information to optimize queries (e.g. static
declaration of empty result) - A query from Q A select Q can be rewritten
with from Q B select Q for each B such that - (C and T) gt (A ltgt B)
20Research Plan pianification
- The challenge is ambitious, it must be intended
as a long term direction of our work - We address some initial tasks we expect to
accomplish - Comparison of STL with other formalisms for types
and constraints - Find a satisfactory decidable logic fragment to
express types (and constraints) - Write a preliminar formal system for constraint
(and type) implication - We plan two stages
- (2nd year) deep study of basic theories (tree
automata, modal logics, description logics) and
initial tasks investigation - (3rd year) Initial tasks completion and
integration of the results in a unified formal
framework
21Research Plan directions
- Main directions, investigate on
- Expressivity of Spatial Tree Logics (in
particular for standard Types and Constraints
specifications) - Decidability and complexity of model checking and
validity for fragments (or extensions) of TQL
logic - Reformulation (or generalization) of known
results about reasoning and optimization over SSD - Other interesting directions
- Implementation of a query rewriter guided by
constraints and types - Extensions to the logic to model order, data
updates, private names
22Background Semi-structured data (SSD)
- Semi - Structured Data (SSD) are used to
- model and query web (HTML, XML, )
- store sperimental data
- integrate eterogeneous databases
-
- SSD are
- Self-describing (structure is implicit)
- Irregular
- Always in evolution