Title: tt dafea
1Results on Data Delivery (WP3)
DBGlobe IST-2001-32645
1st Review Paphos, January 31, 2003
Proactive initiative on Global Computing (GC)
Future and Emerging Technologies (FET)
The roots of innovation
2WP3 Outline
- Co-ordination/Data Delivery
- Task 3.1 Data delivery among the system
components. Derive adaptive data delivery
mechanisms considering various modes of delivery
such as - push (transmission of data without an explicit
request) and pull, - periodic and aperiodic,
- multicast and unicast delivery.
- Task 3.2 Model the co-ordination of the mobile
entities using workflow management (and
transactional workflows) and techniques used in
the multi-agent community.
DBGlobe, 1st Annual Review
Paphos, Jan 2003
3Timeline
Year 1
Year 2
WP3
3.1 Data Delivery
3.2 Coordination
3.3 Performance
15 18 21 24
3 6 9 12
Deliverables
D8 Data Delivery Mechanisms (Oct 2002) D9
Modeling Coordination Through Workflows (April
2003) D10 Data Delivery and Querying (August
2003)
DBGlobe, 1st Annual Review
Paphos, Jan 2003
4Outcomes of WP3 so far
D8 Data Delivery Mechanisms A taxonomy of
mechanisms An outline of potential use within the
DBGlobe architecture
- A number of specific results in data delivery
- Coherent Push-based Data Delivery
- Adaptive Multi-version Broadcast Data Delivery
- Efficient Publish-Subscribe Data Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
5In this presentation
Just a note on the different modes Summary of
technical results 1. Coherent Data Delivery 2.
Adaptive Multi-version Broadcast Data
Delivery 3. Efficient Publish-Subscribe Data
Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
6D8 Taxonomy of Different Modes of Data Delivery
Data Delivery Modes
Client Pull vs. Server Push pull-based transfer
of information is initiated by the
client push-based server-initiated, servers send
information to clients without any specific
request. push is scalable but clients may
receive irrelevant data hybrid schema hot data
are pushed and cold data are pushed Aperiodic
vs. Periodic aperiodic delivery usually
event-driven a data request (for pull) or
transmission (for push) is triggered by an event
(i.e. a user action for pull or a data update for
push). periodic delivery performed according to
some pre-arranged schedule
DBGlobe, 1st Annual Review
Paphos, Jan 2003
7D8 Taxonomy of Different Modes of Data Delivery
Unicast vs 1-N Unicast from a data source
(server) to the client 1-to-N data sent
received by multiple clients multicast and
broadcast Data vs. Query Shipping Based on the
unit of interaction between clients and data
sources Depends on whether the data sources have
data processing capabilities Query shipping may
result in reducing the communication load, since
only relevant data sets are delivered to the
client.
DBGlobe, 1st Annual Review
Paphos, Jan 2003
8D8 Taxonomy of Different Modes of Data Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
9Outline
A note on the different modes Summary of
technical results 1. Coherent Push-based Data
Delivery 2. Adaptive Multi-version Broadcast
Data Delivery 3. Efficient Publish-Subscribe
Data Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
10Coherent Data Delivery
The Data Broadcast Push Model
- The server broadcasts data from a database to a
large number of clients - push mode no direct communication with the
server (stateless server, e.g., sensors) - client-side protocols
- Data updates at the server
- Periodic updates for the values on the channel
Broadcast Channel
Server
Client
- Efficient way to disseminate information to
large client populations with similar interests - Physical support in wireless networks
(satellite, cellular) - Various other applications, sensor networks,
data streams
11Coherent Data Delivery
Our Goal
Ensure that clients receive temporally coherent
(e.g., current) and semantically coherent
(transaction-wise) data
- Provide a model for temporal and semantic
coherency - Show what type of coherency we get if there are
no additional protocols - Show what type of coherency is achieved by a
number of protocols proposed in the literature
(and their extensions)
12Temporal Coherency Model
Currency properties of the readset (set of items
read and their values) based on currency of the
currency of the items in the readset
(Currency Interval of an Item) where cb is the
time instance the value of x read by R was stored
in the database and ce is the time instance of
the next change of this value in the database.
If the value read by R has not been changed
subsequently, ce is infinity.
CI(x, R) currency interval of x in the readset
of R cb, ce)
- Based on CI(x, R), two types of currency of the
readset of a transaction R - Overlapping
- Oldest-value
13Temporal Coherency Model
- ?, say cb, ce) overlapping current, with
overlapping currency, Overlap(R) ce- (if ce is
not infinity), - current_time (otherwise)
? (x, u) ? RS(R) CI(x, R)
there is an interval of time that is included in
the currency interval of all tems in R's readset
In general, oldest value currency of a
transaction R, denoted OV (R), ce-, where ce
is the smallest among the endpoints of the CI(x,
R), for every x, (x, u) ? RS(R).
If R is overlapping current, Overlap(R) OV(R)
14Temporal Coherency Model
If not overlapping, we want to measure the
discrepancy among the database states seen by a
transaction temporal spread
(Temporal Spread of a Readset) Let min_ce be the
smallest among the endpoints and max_cb the
largest among the begin-points of the CI(x, R)
for x in the readset of a transaction R.
temporal_spread(R) max_cb - min_ce, if max_cb
gt min_ce 0 otherwise.
For an overlapping current transaction, the
temporal spread is zero!
15Temporal Coherency Model
Example
R1 reads x1, x2, x3, x4
CI(x1, R1)
CI(x2, R1)
CI(x3, R1)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Overlapping current with Overlap(R) 8 and
temporal_spread(R) 0
16Temporal Coherency Model
Example
R1 reads x1, x2, x3, x4
CI(x1, R1)
CI(x2, R1)
Oldest value read (min_ce)
CI(x3, R1)
max_cb (most current)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Not Overlapping, but OV(R) 8 and
temporal_spread(R) 9 8 1
17Temporal Coherency Model
Example
R1 reads x1, x2, x3, x4
CI(x1, R1)
CI(x2, R1)
Oldest value read (min_ce)
CI(x3, R1)
max_cb (most current)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Not Overlapping, but OV(R) 8 and
temporal_spread(R) 15 8 9
18Temporal Coherency Model
Besides discrepancy, currency (how old are the
values seen)
(Transaction-Relative Currency) R is relative
overlapping current with respect to time instance
t, if t ?CI(x, R), ? x read by R. R is
relative oldest-value current with respect to
time instance t, if t OV(R).
(Temporal Lag) Let tc be the largest t
tcommit_R, with respect to which R is relative
(overlapping or oldest value) current, then
temporal_lag(R) tcommit_R - tc.
The smaller the temporal lag and the temporal
spread, the higher the temporal coherency of a
read transaction.
best temporal coherency when overlapping relative
current with respect to tcommit_R (both the time
lag and the temporal spread are zero).
19Example
Temporal Coherency Model
R1
CI(x1, R1)
CI(x2, R1)
CI(x3, R1)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Overlapping current with Overlap(R) 8
temporal_spread(R) 0 temporal_lag(R) 0
20Example
Temporal Coherency Model
R1
CI(x1, R1)
CI(x2, R1)
CI(x3, R1)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Overlapping current with Overlap(R) 8
temporal_spread(R) 0 temporal_lag(R) 12 8
4
21Example
Temporal Coherency Model
R1
CI(x1, R1)
CI(x2, R1)
CI(x3, R1)
CI(x4, R1)
2 4 6 8 10 12 14 16 18 20
Overlapping current with Overlap(R) 8
temporal_spread(R) 0 temporal_lag(R) 19 8
11
22Temporal Coherency Protocols
- What is the coherency of R (temporal lag and
spread) if R just - reads items from the broadcast?
- Let tlastread_R be the time instance R performs
its last read. - temporal_lag(R) tcommit_R - begin_cycle(tbegin_R
) and temporal_spread(R) tlastread_R -
begin_cycle(tbegin_R) - (tight bounds)There are cases that we get the
worst lag and spread - If pu 0 (immediate updates), best (worst) lag
and spread - If all items from the same cycle, spread is 0,
and lag pu
23Temporal Coherency Protocols
Basic Techniques
- Protocols fall in two broad categories
- invalidation (which corresponds to broadcasting
the endpoints (ces) of the currency interval for
each item) - versioning (which corresponds to broadcasting
the begin points (cbs) of the currency interval
for each item)
And a hybrid protocol that combines versioning
and invalidation
24Temporal Coherency Protocols
Invalidation
Periodically broadcast, IR, a list with the items
that have been updated since the broadcast of the
previous IR In the paper variations that give
transactions with different values of temporal
spread and lag
Versioning
With each item, broadcast a timestamp (version)
when it was created Again in the paper
variations that give transactions with different
values of temporal lag (spread is always 0)
25Semantic Coherency Model
Definitions of Semantic Coherency
(Consistency) C0 C1 RS(R) ? DS (subset of a
consistent database state) C2 R serializable
with the set of server transactions that read
values read (directly or indirectly) by R C3 R
serializable with the all server transactions C4
R serializable with the all server transactions
and the serializability order of the server
transactions that R observes is consistent with
the commit order of transactions at the server
Rigorous schedules commit order compatible with
the serialization order
26Relating Semantic and Temporal Coherency
(Currency Interval of an Item) CI(x, R)
currency interval of x in the readset of R
cb, ce) where cb is the commit time of the
transaction that wrote the value of x read by R
ce is the commit time of the transaction that
updated x immediately after or infinity
27Semantic Coherency Protocols
Reading from a single cycle
If transaction R reads all items from the same
cycle, it is C1 but not necessarily C2
If the server schedule is rigorous and R reads
all items from the same cycle, it is C4
28Semantic Coherency Protocols
Read Test Theorem
It suffices to check for violation of C2, C3, and
C4 by a client transaction R when R reads a data
item if and only if the server schedule is
rigorous
In the paper various read-tests (based on
testing the serailizability graph) for attaining
various Ci-consistency degrees and their
relationships to proposed approaches in the
literature
29Coherency in Broadcast-Based Dissemination
Future Work
- Multiple Servers What is the semantic and
temporal coherency the client gets - Performance Evaluation of the various types of
coherency
Reference
E. Pitoura, P. K. Chrysanthis and K. Ramamritham.
Characterizing the Temporal and Semantic
Coherency of Broadcast-based Data Dissemination.
Proc. of the 9th International Conference on
Database Theory (ICDT03), January 2003, Siena,
Italy.
30Outline
A note on the different modes Summary of
technical results 1. Coherent Push-based Data
Delivery 2. Adaptive Multi-version Broadcast
Data Delivery 3. Efficient Publish-Subscribe
Data Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
31Multi-version Broadcast
Similar Model BUT The server (data source) at
each cycle sends not just one value per item but
instead multiple versions per item
Applications Multiple data servers share the
channel (multi-sensors networks) Enhance
consistency at the server (similar to
multi-version schemes in traditional
client-server systems)
32Multi-version Broadcast
Issues How should the broadcast be
organized? What are appropriate client-cache
protocols?
Adaptability Performance depends on client access
patterns Historical queries Random queries
33Multi-version Broadcast
References
E. Pitoura and P. K. Chrysanthis. Multiversion
Data Broadcast, IEEE Transactions on Computers
51(10)1224-1230, October, 2002 O. Shigiltchoff,
P. K. Chrysanthis and E. Pitoura. Multi-version
Data Broadcast Organizations. In Proc. of the
6th East European Conference on Advances in
Databases and Information Systems (ADBIS),
September 2002, Bratislava, Sloavakia O.
Shigiltchoff, P. K. Chrysanthis and E. Pitoura.
Adaptive Multi-version Data Broadcast
Organizations, In preparation for journal
publication
34Outline
A note on the different modes Summary of
technical results 1. Coherent Push-based Data
Delivery 2. Adaptive Multi-version Broadcast
Data Delivery 3. Efficient Publish-Subscribe
Data Delivery
DBGlobe, 1st Annual Review
Paphos, Jan 2003
35Extra Slides
36Extra Slides (coherency)
37Coherent Data Delivery
The Model
- The server repetitively pushes data from a
database to a large number of clients - sequential client access
- asymmetry
- large number of clients
- transmission capabilities
- Client-site protocols
- The server is stateless
- Data updates at the server
Server
Client
Broadcast Channel
38Coherency in Broadcast-Based Dissemination
Updates
Data are updated at the server What is the value
broadcast at time instance t? we assume periodic
updates with an update frequency or period of pu
meaning that the value placed at time t is the
value of the item at the beginning of the update
period denoted begin_cycle(t) For periodic
broadcast, usually pu is equal to the broadcast
period
39Coherent Data Delivery
Preliminary Definitions
Database state set of (data item, value)
pairs Readset of a transaction R, RS(R) set of
(data item, values) that R read BSc the content
of the broadcast at the cycle that starts at time
instance c (again a set of (data item, value)
pairs R may read items from different broadcast
cycles, thus items in RS(R) may correspond to
different database states
40Semantic Coherency Model
Variations instead of a single client
transaction a set S of client transactions
Example C3- site All transactions of a client
serializable with all server transactions C3 -
All
41Relating Semantic and Temporal Coherency
- Assumptions
- Server schedules are serializable
- Broadcast only committed values
If R is overlapping current, then it is C1
consistent
42Relating Semantic and Temporal Coherency
(Currency Interval of an Item) CI(x, R)
currency interval of x in the readset of R
cb, ce) where cb is the commit time of the
transaction that wrote the value of x read by R
ce is the commit time of the transaction that
updated x immediately after or infinity
- Note
- overlapping currency similar to vintage
transactions Server schedules are serializable
ce-vinatge - semantic currency similar to t-bound, if OV(R)
to, then to-bound
43Coherency in Broadcast-Based Dissemination
Previous Work
- cache consistency
- (e.g., BarbaraImielinski, SIGMOD95, Acharya et
al, VLDB1996)
- Datacycle Bowen et al, CACM92 hardware for
detecting changes - Extended for multiple servers BanerjeeLi,
JCI94 - Certification reports Barbara, ICDCS97
- F-Matrix for (update (C2) consistency)
Shanmugasundaram, SIGMOD99 - SGT Graph (for serializability) Pitoura,
ER-Workshop98, Pitoura, DEXA-Workshop98,
PitouraChrysanthis, ICDCS99 - Multiple Versions PitouraChrysanthis, VLDB99
PitouraChrysanthis, IEEE TOC 2003
44DBGlobe IST-2001-32645