Title: InstanceIndependent Concurrency Control for Semistructured Databases
1Instance-Independent Concurrency Control for
Semistructured Databases
- Jan Paredaens, Jan Hidders enStijn Dekeyser
- ADReM onderzoeksgroep, Universiteit Antwerpen
2Problem Statement (1/4)
Concurrency Control for Semistructured Data?
- Access additions, deletions, path expressions
- Use tree-shape of data, tree-shape is the data
- Path locks on instance nodes
Instance independent locking?
- Inst. dep. locking leads to many locks
- Instances are big, transactions small
3Problem Statement (2/4)
- Example Inst.-dep locking
//child//hobby
//child//hobby
Doc. root
//child//hobby
child//hobby
document
//child//hobby
child//hobby
//child//hobby
child//hobby
person
person
//hobby
//child//hobby
hobby
child//hobby
child
child
age
name
addr
hobby
addr
name
age
//child//hobby
child//hobby
//hobby
hobby
person
person
age
name
addr
hobby
hobby
age
name
addr
4Problem Statement (3/4)
Group
Transaction T3 Add(Group,member,Person2) Transac
tion T4 Add(Person2,hobby,Cycling)
member
Person1
hobby
Cycling
Schedule 4 T3 Add(Group,member,Person2) T4
Add(Person2,hobby,Cycling) Serial - defined
Schedule 5 T4 Add(Person2,hobby,Cycling) T3
Add(Group,member,Person2) Not defined (for any
document)
Schedule 6 T4 Add(Person2,hobby,Cycling) Defin
ed (not defined for documents
without Person2)
5Problem Statement (4/4)
- Some schedules are defined for some input
documents, - not for others
- Some schedules are serializable for some input
documents, - not for others
- Characterize the schedules for which there is at
least one - input document for which they are defined and
that are - serializable for all input documents for which
they are defined.
- input documents have no DTD nor XML-schema
- schedules are given completely, not
incrementally
6Path expressions and the paths they represent
Let a, b, c be labels of edges a L(a)
a a/b L(a/b) a/b a//b L(a//b) a/a/b,
a/b/b, a/c/b, a//b L(a//b) a/b, a/c/b,
a/c/a/b/c/b, . L(.) e
7Queries, Additions, Deletions
- XQuery
- XUpdate
- Query(n, pe) DT m there is a path in the
document tree - DT
from n to m that is labeled with - a
string of L(pe) - Add(n, l, n) DT DT ? (n, l, n), only
defined if the result - is a document tree
- Del(n, l, n) DT DT - (n, l, n), only
defined if (n, l, n) is -
in DT and the result is a -
document tree
8Action, Transaction, Schedule
An action (o, t) o Add, Del, Query t
transaction identifier A transaction is a
sequence of actions with the same transaction
identifier A schedule over a set of
transactions is an interleaving of these
transactions
91
a
2
Example
101
(Add(1,b,3), t1 )
a
b
2
3
Example
111
(Add(1,b,3), t1 ) (Add(1,a,4), t1 )
a
a
b
2
3
4
Example
121
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 )
a
a
b
2
3
4
a
5
Example
131
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) ?
a
a
b
2
3
4
a
5
Example
141
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,5), t2 )
a
a
b
2
3
4
a
5
Example
151
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,5), t2 )
a
a
b
2
3
4
a
5
Example
161
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
Example
171
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example
181
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1 ) 6,7
a
a
b
2
3
4
b
a
5
6
c
7
Example
191
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(1,b,4), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example
201
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(1,b,4), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example
211
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
c
7
Example
221
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
c
7
Example
231
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(6,c,7), t1 )
a
a
b
2
3
4
b
a
5
6
Example
241
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5),
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1
) (Add(6,c,7), t3 ) (Query(1,a//), t1
) (Del(6,c,7), t1 ) (Query(1,a//), t2 ) 6
a
a
b
2
3
4
b
a
5
6
Example
25Defined, correct, equivalence (1/?)
A schedule S is called defined on a document tree
DT iff the sequence of actions (Adds and Dels)
of S is defined on DT. A schedule S is called
correct if there is at least one DT on which S
is defined. Two correct schedules S1 and S2
over the same set of transactions are called
equivalent on DT if they are both defined on DT,
S1DT S2DT and the corresponding queries
give the same result.
26Defined, correct, equivalence (2/?)
Two correct schedules over the same set of
transactions are called equivalent if they are
defined on the same set of DTs and they are
equivalent on these DTs. A schedule is called
serializable if it is equivalent with a
serial schedule.
S1 (Add(1, a, 2), t1) (Del(1, a, 2), t2) (Add(1,
a, 2), t1) S1 is correct S1 is not serializable
since t1 is not correct.
271
1
1
Example
a
b
2
2
DT1
DT2
DT3
S1 (Add(2, b, 3),t1) (Query(1,a/b),t2)
S2 (Query(1,a/b),t2) (Add(2, b, 3),t1)
S1DT1 /? S2DT1 S1 DT2 ? S2 DT2 S1 DT3
and S2 DT3 not defined
28Example
S1 (Add(2, b, 3),t1) (Query(1,a/b),t2)
S2 (Query(1,a/b),t2) (Add(2, b, 3),t1)
S1 and S2 are defined on the same set of DTs and
are not (necessarily) equivalent on these DTs
S3 (Add(2, b, 3),t1) (Add(2, b, 4),t2)
S4 (Add(2, b, 4),t2) (Add(2, b, 3),t1)
S3 and S4 are defined on the same set of DTs and
are equivalent on these DTs
S5 (Add(2, b, 3),t1) (Del(2, b, 3),t2)
S6 empty
S5 and S6 are not defined on the same set of DTs
29Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) NOT EQUIVALENT
30Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2), t1)
EQUIVALENT
31Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5), t1)
EQUIVALENT
32Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7), t1) NOT
EQUIVALENT
33Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7),
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2)
EQUIVALENT
34Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7),
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2,
c, 8), t1) NOT EQUIVALENT
35Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7),
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2,
c, 8), t1) (Query(1, b), t2) (Query(1, b), t2)
NOT EQUIVALENT
36Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7),
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2,
c, 8), t1) (Query(1, b), t2) (Query(1, b),
t2) (Add(2, c, 8), t1) EQUIVALENT
37Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3),
t2) (Add(1, b, 3), t2) (Add(1, a, 2),
t1) (Del(4, c, 5), t1) (Del(4, c, 5),
t1) (Del(4, c, 6), t2) (Del(4, c, 7),
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2,
c, 8), t1) (Query(1, b), t2) (Query(1, b),
t2) (Add(2, c, 8), t1) (Query(1, a), t1)
(Query(1, a), t1) EQUIVALENT
38Results (1/2)
Is it decidable whether a given transaction is
correct? Is it decidable whether a given
schedule is correct? Is it decidable whether two
given transactions are equivalent? Is it
decidable whether two given schedules are
equivalent? Is it decidable whether a given
schedule is serializable?
39Results (2/2)
Is it decidable whether a given transaction is
correct? YES! Is it decidable whether a given
schedule is correct? YES! Is it decidable
whether two given transactions are equivalent?
YES! Is it decidable whether two given schedules
are equivalent? YES! Is it decidable whether a
given schedule is serializable? YES!
40Correctness of queryless schedules (1/2)
- Correctness has nothing to do with queries
- Consider queryless schedules (QL schedules).
- The following conditions are necessary and
sufficient - for correct QL schedules
- Between (Add(n,a,n1),t1) and (Add(n2,b,n),t2)
there is (Del(n,a,n1),t3) - Between (Add(n1,a,n),t1) and (Add(n2,b,n),t2)
there is (Del(n1,a,n),t3) - Between (Add(n,a,n1),t1) and (Del(n2,b,n),t2)
there is (Del(n,a,n1),t3) - Between (Add(n1,a,n),t1) and (Del(n,b,n2),t2)
there is (Add(n,b,n2),t3) - Between (Add(n1,a,n),t1) and (Del(n2,b,n),t2)
there is (Del(n1,a,n),t3), (n1,a) ltgt (n2,b) - Between (Del(n,a,n1),t1) and (Add(n2,b,n),t2)
there is (Del(n3,c,n),t3) - Between (Del(n1,a,n),t1) and (Add(n,b,n2),t2)
there is (Add(n3,c,n),t3) - Between (Del(n1,a,n),t1) and (Del(n,b,n2),t2)
there is (Add(n3,c,n),t3) - Between (Del(n1,a,n),t1) and (Del(n2,b,n),t2)
there is (Add(n2,b,n),t3)
41Correctness of queryless schedules (2/2)
- It is decidable whether a schedule (a
transaction) is - correct in O(n3) time, n being the length of the
schedule - (transaction), and constant space.
- SDT DT ? ADD(S) DEL(S)
- if S is defined on DT
- ADD(S) edges e whose last occurrence in S is
Add(e) - DEL(S) edges e whose last occurrence in S is
Del(e)
42Equivalence of correct QL schedules (1/5)
Let S1 be correct and equivalent with the serial
S2. We cannot necessarily go from S1 to S2 by
swapping actions S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2)
(Del(1,b,3),t1) (Del(1,b,3),t2)
43Equivalence of correct QL schedules (2/5)
Let S1 be correct and equivalent with the serial
S2. We cannot necessarily go from S1 to S2 by
swapping actions S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2)
(Del(1,b,3),t1) (Del(1,b,3),t2)
44Equivalence of correct QL schedules (3/5)
Let S1 be correct and equivalent with the serial
S2. We cannot necessarily go from S1 to S2 by
swapping actions S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2)
(Del(1,b,3),t1) (Del(1,b,3),t2) Remark that
S1 is not equivalent with the other serial
schedule S3.
45Equivalence of correct QL schedules (4/5)
- NI(S) the nodes that must belong to DTs on
which S is defined - m first occurrence of m
has the form Add(m,l,n), Del(m,l,n), Del(n,l,m) - N-I(S) the nodes that may not belong to DTs on
which S is defd - m first occurrence of m
has the form Add(n,l,m) - EI(S) the edges that must belong to DTs on
which S is defined - e first occurrence of m
has the form Del(e) - E-I(S) the edges that may not belong to DTs on
which S is defd - e see paper
- NI(S), N-I(S), EI(S) and E-I(S) are correct
- NI(S), N-I(S), EI(S) and E-I(S) can be
calculated in O(n2) time and - O(n) space
46Equivalence of correct QL schedules (5/5)
- S1 and S2, QL transactions or schedules over the
same set - of transactions are equivalent iff
- - NI(S1) NI(S2)
- - N-I(S1) N-I(S2)
- - EI(S1) EI(S2)
- - E-I(S1) E-I(S2)
-
- The equivalence of two QL transactions or
schedules - over the same set of transactions can be
decided in O(n2) time - and O(n) space.
47Output Sets vs. Input Sets (1/2)
- NO(S) the nodes that must belong to SDT
- m last occurrence of m has
the form Add(m,l,n), Del(m,l,n), Add(n,l,m) - N-O(S) the nodes that may not belong to SDT
- m last occurrence of m has
the form Del(n,l,m) - EO(S) the edges that must belong to SDT
- e last occurrence of m has the
form Add(e) - E-O(S) the edges that may not belong to SDT
- e see paper
- NO(S), N-O(S), EO(S) and E-O(S) are correct
- NO(S), N-O(S), EO(S) and E-O(S) can be
calculated in O(n2) - time and O(n) space
48Output Sets vs. Input Sets (2/2)
- If S1 and S2 are correct transactions or
schedules - then S1.S2 is correct iff
- N-O(S1) ? NI(S2) ?, E-O(S1) ? EI(S2) ?,
- NO(S1) ? N-I(S2) ? , EO(S1) ? E-I(S2) ?
- If S1, S2, , Sk, S1.S2Sk, are k1 correct
schedules then - NI(S1Sk) ?i1..k(Ni(Si) - ?jlti N-i(Sj))
- N-I(S1Sk) ?i1..k(N-i(Si) - ?jlti Ni(Sj))
- EI(S1Sk) ?i1..k(Ei(Si) - ?jlti E-i(Sj))
- E-I(S1Sk) ?i1..k(E-i(Si) - ?jlti Ei(Sj))
49Main Results
- Given a QL schedule S of k transactions and n
actions. It is decidable whether S is
serializable in time O(f(k).n3) where f(k) can
be exponential in k and in space O(k.n). - Given a correct schedule S of k transactions
and n actions. It is decidable whether S is
serializable in time O(f(k).n6) where f(k) can
be exponential in k and in space O(n2).
to be continued