Title: Distributed
1- Distributed
- Query-Sub-Query
- Presented by Noam Pettel
- 29/5/05
2Motivation
- Optimization of query evaluation in a
peer-to-peer environment - Development of a distributed algorithm based on
Query-Sub-Query technique for optimization of
Datalog queries in a peer-to-peer environment - Implementation of the algorithm using the Active
XML system
3Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
4Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
5Example
- Input
- We are interested in the ancestor(x,y) relation
- Typical query Give me all the ancestors of
Andy
parent(x,y)
Alice Nancy
Alice Joyce
Joyce Lois
Lois Mark
Lois Andy
Joyce Ruth
6Relational Database
- A Database composed of relations (tables)
- Stores only explicit information
anc(x,y)
parent(x,y)
Alice Nancy
Alice Joyce
Joyce Lois
Lois Mark
Lois Andy
Joyce Ruth
Alice Lois
Alice Mark
Alice Andy
Alice Ruth
Joyce Mark
Joyce Andy
Alice Nancy
Alice Joyce
Joyce Lois
Lois Mark
Lois Andy
Joyce Ruth
7Deductive Database
- Explicit information
- Rules that enable inferences based on the stored
data
Datalog program
parent(x,y)
anc(x,y) - parent(x,y) anc(x,y) - anc(x,z),
parent(z,y)
Alice Nancy
Alice Joyce
Joyce Lois
Lois Mark
Lois Andy
Joyce Ruth
?
head
body
recursions
? x,y (anc(x,y) ? parent(x,y)) ? x,y,z
(anc(x,y) ? anc(x,z), parent(z,y))
8Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
9Query Evaluation
- Query
- Goal Compute query with minimal data
materialization
q(y) - anc(Joyce,y)
10QSQ
- Known technique for optimization of Datalog
queriesQuery-Sub-Query (QSQ) - QSQ rewrites the Datalog program according to the
given query - QSQ is based on two main notions
- Binding patterns
- Supplementary relations
11Binding Patterns
anc(x,y) - parent(x,y) anc(x,y) - anc(x,z),
parent(z,y) q(y) - anc(Joyce,y)
- For each relation, adorned versions of the
relation based on the bindings of the variables
are considered - For example, adorned versions of anc are ancbb,
ancbf, ancfb, ancff,
12Binding Patterns
anc (x,y) - parent(x,y) anc (x,y) - anc
(x,z), parent(z,y) q(y) - anc
(Joyce,y)
bf
bf
bf
bf
bound to a constant
free
- The same relation may appear with different
adornments in the Datalog program - different adornments of the same relation are
treated as different relations during the QSQ
computation
13Supplementary Relations
sup_10(x) - in_anc_bf(x) sup_11(x,y) -
sup_10(x), parent(x,y) anc_bf(x,y) -
sup_11(x,y) sup_20(x) - in_anc_bf(x) sup_21(x
,z) - sup_20(x), anc_bf(x,z) sup_22(x,y) -
sup_21(x,z), parent(z,y) anc_bf(x,y) -
sup_22(x,y)
ancbf (x,y) - parent(x,y) ancbf (x,y) -
ancbf (x,z), parent(z,y) q(x) - ancbf
(Joyce,x)
sup_10(x)
sup_11(x,y)
QSQ rewriting of the program
sup_22(x,y)
sup_20(x)
sup_21(x,z)
- For each adorned relation and each position in
the body of a rule, we define a supplementary
relation to accumulate the bindings relevant to
that position
14QSQ Example
ancbf (x,y) - parent(x,y)
parent(x,y)
sup_10(x)
sup_11(x,y)
Alice Nancy
Alice Joyce
Joyce Lois
Lois Mark
Lois Andy
Joyce Ruth
Joyce, Lois Joyce, Ruth
Joyce
ancbf (x,y) - ancbf (x,z), parent(z,y)
sup_20(x)
sup_21(x,z)
sup_22(x,y)
Joyce, Lois Joyce, Ruth
Joyce, Mark Joyce, Andy
Joyce
Joyce, Mark Joyce, Andy
q(y) - ancbf (Joyce,y)
Lois Ruth
Mark Andy
query result
15Properties of QSQ
- Compute the correct answer to the query
- Materialize only a minimal set of tuples
- Guaranteed to terminate
QSQ evaluations have nice properties!
16Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
17Distributed Environment
Centralized Datolog program
r1 r(x,y) - a(x,y) r2 r(x,y) - s(x,z),
t(z,y) r3 s(x,y) - r(x,y), b(y,z) r4 t(x,y) -
c(x,y)
Distribution of the program between 3 peers
r1 r_at_R(x,y) - a_at_R(x,y) r2 r_at_R(x,y) -
s_at_S(x,z), t_at_T(z,y)
r3 s_at_S(x,y) - r_at_R(x,y), b_at_S(y,z)
The rules at peer P are the rules where P is the
peer of the head
r4 t_at_T(x,y) - c_at_T(x,y)
18Naïve Distributed Evaluation
- Activation of remote relations
r2 r_at_R(x,y) - s_at_S(x,z), t_at_T(z,y)
R
request
request
S
T
response
response
AXML and Web Services make it very easy!
19Termination Detection
- We need to detect when the system reaches a
fixpoint - Fixpoint is reached when no new facts can be
derived at any peer - Termination detection is a standard problem in
distributed computing
20Termination Detection
- The model
- Communication is asynchronous
- Each message eventually arrives and acknowledged
- At some point, the site that started the query
decides to check for termination - It calls all the sites that it directly invoked
and asks them if they completed - These sites contact the sites they invoked and so
on
21Termination Detection
- A site answers positively if
- It is idle (cannot produce more data)
- All the data it has sent has been acknowledged
- All its successors believe the computation
terminated
22Termination Detection
r1 r_at_R(x,y) - a_at_R(x,y) r2 r_at_R(x,y) -
s_at_S(x,z), t_at_T(z,y)
r3 s_at_S(x,y) - r_at_R(x,y), b_at_S(y,z)
r4 t_at_T(x,y) - c_at_T(x,y)
- Build a graph to represent the distributed
Datalog program - Recursions result in cycles in the graph
- Use a spanning tree of the graph in order to
decide termination
23Distributed QSQ Rewriting
- For each rule The peer in the head of the rule
starts the rewriting - When a remote relation is encountered, the peer
delegates the remainder of the rule to the remote
peer in charge of that relation
24Distributed QSQ Rewriting
sup_0(x) - in_r_bf(x) sup_1(x,z) - sup_0(x),
s(x,z) sup_2(x,y) - sup_1(x,z),
t_bf(z,y) r_bf(x,y) - sup_2(x,y)
centralized
distributed
- R computes sup_0_at_R(x) - in_r_bf_at_R(x)
- R sends to S
- sup2_at_S(x,y) - sup0_at_R(x,y), s_bf_at_S(x,z),
t_bf_at_T(z,y)
25Distributed QSQ Rewriting
- The rewriting is performed locally at each peer,
without any global knowledge - Once the QSQ rewriting is complete, we start the
QSQ computation process Like in the central
case, except for calling remote services
26Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
27Why Active XML?
- AXML is a natural selection
- An AXML document contains both explicit and
implicit data, just like in Datalog
ltrgt lttgt ltxgt1lt/xgt ltygt2lt/ygt lt/tgt lttgt ltxgt1lt/xgt
ltygt3lt/ygt lt/tgt ltscgt
r_at_R(x,y) - s_at_S(x,z), t_at_T(z,y)
continuous services
S
T
28Implementation Steps
- Given a distributed Datalog program and a query
- Transform the Datalog program to distributed QSQ
- Transform the distributed QSQ to Active XML
- Run!
- Detect termination
29Outline
- Datalog
- Query-Sub-Query (QSQ)
- Distributed Query-Sub-Query (dQSQ)
- Implementation using AXML
- Using dQSQ for Petri Nets
30Article
- Diagnosis of Asynchronous Discrete Event
Systems Datalog to the Rescue! - S. Abiteboul, Z. Abrams, S. Haar, T. Milo
- PODS, June 2005
31Datalog P2P
- Deductive databases was a hot topic in the late
80s - Research in this area led to beautiful results,
with little industrial impact - Years later, with networks everywhere, recursive
data management is becoming more essential - Datalog and QSQ become hot again!
32Abstract
- Diagnosis of distributed telecommunication
systems - The problem can be modeled by Datalog
- Can benefit from dQSQ
33Petri Nets
marked place
transition
alarm symbol
place
- The marked places model the current state of the
peer - A transition node is enabled iff all its parent
nodes are marked
- An enabled transition can fire and yield a new
Petri net - If a transition fires, its alarm symbol is
reported to the supervisor - For example, if transition (i) fires. The marking
moves from places 1,7 to places 2,3
34The Problem
- The supervisor receives an alarm sequence
(a1,p1),(a2,p2),,(an,pn).Ai An alarm
symbolPi The peer that emitted the alarm - Due to asynchronous communication
- We do not guarantee that alarms sent by different
peers appear in the order they were emitted - We can only assume that the order of alarms is
kept for each individual peer - Goal Find an explanation for a given alarm
sequence
35Example
- The set of shaded nodes in figure 2 is a
diagnosis for the alarm sequence (b p1), (a
p2), (c p1).
36From Petri Nets to dQSQ
- Petri Nets can be modeled by Datalog and dQSQ
- A set of relations and rules is defined at each
peer - Each peer builds its own Datalog program using
local information only, even if it has
transitions to other peers
37From Petri Nets to dQSQ
- Here is a small part of the Datalog rules
38From Petri Nets to AXML
- Translation steps from Petri Nets to Active XML
Petri Net
Datalog
QSQ
AXML
PNet2Datalog
Datalog2QSQ
QSQ2AXML
39The End