Title: p2p06
1Topics in Database Systems Data Management in
Peer-to-Peer Systems
PART 1 Replication and other issues
2Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
3Term Projects
- ???as?e? t???? t?p??
- ????? ??p??? e?e???t??? ?a?a?t??a ??e???eta?
?a s?efte?te - ?e? ?p???e? µ?a ??s? (??a t?? ?d?a e??as?a
pa?ap??? ap? µ?a ?µ?de?) - Ta ??e?a 3 ?t?µa a?? ?µ?da
- ?? ??ete ??p??a ???? ?d?a ???eta? a??? ???
a?t?µata - Ta ft???ete µ?a web se??da ??a t? project t??
?p??a ?a µ?? ste??ete - replicate content and not index (for
durability)!!!
4Term Projects
??G?S?? ????? I Ta ep????ete
??a ????? ap? µ?a ??sta ap? ????a ?a ????a
af????? p??ί??µata d?a?e???s?? ded?µ???? e?te se
?e?t????p???µ??a s?st?µata e?te se ?ata?eµ?µ??a
s?st?µata ????? t?? ?d??t?te? t?? s?st?µ?t??
?µ?t?µ??. St???? t?? e??as?a? e??a? ? s?ed?as?
µ?a e?d???? t?? p??ί??µat?? ?at??????? ??a ??a
s?st?µa ?µ?t?µ?? ??µί??. ? e??as?a sa? ?a p??pe?
?a pe????e? µ?a µ??f? a???????s?? t?? p??s????s??
sa?. ??t? µp??e? ?a e??a? ?e???t??? (p?, e?t?µ?s?
p???p????t?ta? t?? ??s??, ap?de??? t?? ????t?ta?
? ????? ?d??t?t?? (p? e??s????p?s? f??t???) t??
??s??) ?/?a? ?a pe???aµί??e? µ?a µ???? ???p???s?.
Ta pa?ad?sete ??a ????? p?? ?a ??e? t?? µ??f?
e?e???t???? e??as?a? (?a d????? ?d???e?).
?p?s??, ?a pa???s??sete t?? e??as?a sa? st?
µ???µa (?a d????? ?d???e?).
5Term Projects
- ????a ??a t?? ???as?e? ??p?? ?
- 1-3 ??a???te ?p???d?p?te (??a) ap? ta sections
3, 4 ? 5 ap? t? M. Stonebraker, P. M. Aoki, W.
Litwin, A. Pfeffer, A. Sah, J. Sidell C. Staelin
and A. Yu. Mariposa A Wide-Area Distributed
Database System. VLDB J., 5(1), 1996, 48-63. - 4 ??et?ste p?? t? pa?a??t? p?? s???t?saµe st?
µ???µa µp??e? ?a p??sa?µ?ste? ??a p2p A. J.
Demers, D. H. Greene, C. Hauser, W. Irish, J.
Larson, S. Shenker, H. E. Sturgis, D. C.
Swinehart, D. B. Terry Epidemic Algorithms for
Replicated Database Maintenance. PODC 1987 1-12 - 5 Te??e?ste µ?a ?ata?eµ?µ??? (p2p) e?d??? e???
bitmap index. - G?a ta bitmap indexes µp??e?te ?a s?µί???e?te?te
?p???d?p?te ί?ί??? ί?se?? ded?µ???? ?/?a? t?
pa?a??t? - P. E. O'Neil and D. Quass. Improved Query
Performance with Variant Indexes. Proc. SIGMOD
Conference, 1997, 38-49. - 6 ??et?ste p?? t? pa?a??t? p?? af??? sensor
networks µp??e? ?a efa?µ?ste? se p2p s?st?µata
D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos,
V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas,
D. Srivastava The Threshold Join Algorithm for
Top-k Queries in Distributed Sensor Networks,
DMSN Workshop, 2005.
6Term Projects
??G?S?? ????? ?? Ta
ep????ete ??a ????? p?? af??? ??µata t?? pe??????
t?? s?st?µ?t?? ?µ?t?µ?? ??µί?? p?? de? ????µe
?a???e? st? µ???µa, s???e???µ??a (i) security,
(ii) trust/reputation, (iii) incentives, (iv)
publish-subscribe s?st?µata. ?a???s?as? t??
?????? st? µ???µa. (a) p??te??ete ??p??a
ep??tas? t?? ??????, p? efa?µ??? t?? se ???? t?p?
overlay, ίe?t??s? ??p???? ?a?a?t???st???? t??
??p. Se a?t?? t?? pe??pt?s?, ?a p??pe? ?a
s?µpe????ίete ?a? ??p??a µ??f? a???????s?? t??
ep??tas??. ??t? µp??e? ?a e??a? ?e???t??? (p?,
e?t?µ?s? p???p????t?ta? t?? ??s?? ??p) ?/?a? ?a
pe???aµί??e? µ?a µ???? ???p???s?, e?te (ί) ?a
???p???sete ??a ??a??p???t??? ??µµ?t? t?? ??????.
Ta pa?ad?sete ??a ????? p?? ?a ??e? t?? µ??f?
e?e???t???? e??as?a? (?a d????? ?d???e?).
?p?s??, ?a d?sete µ?a de?te?? pa???s?as? st?
µ???µa a?t? t? f??? t?? e??as?a sa? (?a d?????
?d???e?).
7Term Projects
- ????a ??a t?? ???as?e? ??p?? ?I
- Security E. Sit and R. Morris Security
Considerations for Peer-to-Peer Distributed Hash
Tables. IPTPS 2002 261-269 D. S. Wallach A
Survey of Peer-to-Peer Security Issues. ISSS
2002 42-57 - Incentives M. Feldman, K. Lai, I. Stoica and J.
Chuang Robust incentive techniques for
peer-to-peer networks. ACM Conference on
Electronic Commerce 2004 102-111 - Trust/Reputation S. D. Kamvar, M. T. Schlosser,
H. Garcia-Molina The Eigentrust algorithm for
reputation management in P2P networks. WWW 2003
640-651 - Publish/subscribe M. Bender, S. Michel, S.
Parkitny, and G. Weikum A Comparative Study of
Pub/Sub Methods in Structured P2P Networks.
DBISP2P 2006, Seoul, South Korea, Springer, 2006
8Term Projects
??G?S?? ????? ?I? Ta
ep????ete ??a ap? ta s?st?µata p?? af?????
????sµ??? s?st?µ?t?? ?µ?t?µ?? ??µί??. Ta p??pe?
?a e??atast?sete t? s?et??? ????sµ??? ?a? ?a
?atas?e??sete µ?a µ???? efa?µ???. Ta pa?ad?sete
??a ????? p?? ?a pe???aµί??e? ??a s??t?µ?
e??e???d?? ??a t? s?st?µa ?a? µ?a pe????af? t??
efa?µ??? sa?. ?p?s??, ?a pa???s??sete t??
e??as?a sa? st? µ???µa (?a d????? ?d???e?). ?
pa???s?as? ?a p??pe? ?a pe???aµί??e? ?a? ??a
s??t?µ? demo.
9Term Projects
?a S?st?µata ??a t?? ???as?e? ??p?? I?I 1
OpenDHT OpenDHT is a publicly accessible
distributed hash table (DHT) service. 2 P2
Declarative Networking P2 is a system which uses
a high-level declarative language to express
overlay networks in a highly compact and reusable
form 3 PeerSim PeerSim is a simulation
environment for P2P protocols in java.
10Term Projects
????esµ?e? ?e? 7 S??µat?sµ?? ?µ?d?? ?a?
ep????? e??as?a? ?e? 14 1-2 se??de? "p??tas?
e??as?a?" (project proposal) (?a d?d??? ?d???e?)
?e? 21 p??a??? ?a ????µe µ?a µ????
pa???s?as?/s???t?s? t?? e??as??? t?? te?e?ta?a
eίd?µ?da p??? ta ???st???e??a ?a? 11
?a???s??se?? ?????? ?µ?da? ?? ?a? 18 "
" ?a? 25 ?a??d?s? ???as?a? (??a t? ?????, ?a
d????? ?d???e?) Ta ?p???e? ??a te???? workshop
p?? ?a pa???s?ast??? ?? e??as?e? ???? t?? ?µ?d??.
11Agenda ??a s?µe?a
- ?e????af? t?? e??as??? t?? µa??µat??
- 2. Ge???? ??a Replication
- 3. Replication Theory for Unstructured (Cohen et
al paper) - 4. Epidemic Algorithms for Updates (Demers et al
paper)
12Types of Replication
- Two types of replication
- Metadata/Index replicate index entries
- Data/Document replication replicate the actual
data (e.g., music files) - Metadata vs Data
- () Lighter storage and bandwidth wise
- () Sizes of replicated objects more uniform
- (-) Adds an extra hop for actually getting the
data - (-) More frequent updates
- (-) Less durability/availability
13Types of Replication
Caching vs Replication Cache Store data
retrieved from a previous request
(client-initiated) Replication More proactive,
a copy of a data item may be stored at a node
even if the node has not requested it
14Reasons for Replication
- Reasons for replication
- Performance
- load balancing
- locality place copies close to the requestor
- geographic locality (more choices for the next
step in search) - reduce number of hops
- Availability
- In case of failures
- Peer departures
15Reasons for Replication
Besides storage, cost associated with
replication Consistency Maintenance Make reads
faster in the expense of slower writes
16- No proactive replication (Gnutella)
- Hosts store and serve only what they requested
- A copy can be found only by probing a host with a
copy - Proactive replication of keys ( meta data
pointer) for search efficiency (FastTrack, DHTs) - Proactive replication of copies for search
and download efficiency, anonymity. (Freenet)
17Issues
Which items (data/metadata) to replicate Based
on popularity In traditional distributed systems,
also rate of read/write cost benefit the
ratio read-savings/write-increase Where to
replicate (allocation schema)
18Issues
How/When to update Both data items and metadata
19Database-Flavored Replication Control Protocols
Lets assume the existence of a data item x with
copies x1, x2, , xn x logical data item xis
physical data items
A replication control protocol is responsible for
mapping each read/write on a logical data item
(R(x)/W(x)) to a set of read/writes on a
(possibly) proper subset of the physical data
item copies of x
20One Copy Serializability
Correctness A DBMS for a replicated database
should behave like a DBMS managing a one-copy
(i.e., non-replicated) database insofar as users
can tell
One-copy schedule replace operation of data
copies with operations on data items
One-copy serializable (1SR) the schedule of
transactions on a replicated database be
equivalent to a serial execution of those
transactions on a one-copy database
21ROWA
Read One/Write All (ROWA) A replication control
protocol that maps each read to only one copy of
the item and each write to a set of writes on all
physical data item copies.
Even if one of the copies is unavailable an
update transaction cannot terminate
22Write-All-Available
Write-all-available A replication control
protocol that maps each read to only one copy of
the item and each write to a set of writes on all
available physical data item copies.
23Quorum-Based Voting
- Read quorum Vr and a write quorum Vw to read or
write a data item - If a given data item has a total of V votes, the
quorums have to obey the following rules - Vr Vw gt V
- Vw gt V/2
Rule 1 ensures that a data item is not read or
written by two transactions concurrently
(R/W) Rule 2 ensures that two write operations
from two transactions cannot occur concurrently
on the same data item (W/W)
24Distributing Writes
Immediate writes Deffered writes Access only one
copy of the data item, it delays the distribution
of writes to other sites until the transaction
has terminated and is ready to commit. It
maintains an intention list of deferred
updates After the transaction terminates, it send
the appropriate portion of the intention list to
each site that contains replicated
copies Optimizations aborts cost less may
delay commitment delays the detection of
conflicts Primary or master copy Updates at a
single copy per item
25Eager vs Lazy Replication
Eager replication keeps all replicas
synchronized by updating all replicas in a single
transaction Lazy replication asynchronously
propagate replica updates to other nodes after
the replicating transaction commits
In p2p, lazy replication (or soft state)
26Update Propagation
- Stateless or State-full (the item-owners know
which nodes holds copies of the item) - Who initiates the update
- Push by the server item (copy) that changes
- Pull by the client holding the copy
27Update Propagation
- When
- Periodic
- Immediate
- Lazy when an inconsistency is detected
- Threshold-based Freshness (e.g., number of
updates or actual time) - Value
- Expiration-Time Items expire (become invalid)
after that time (most often used in p2p) - Adaptive periodic
- Reduce or increase period based on the updates
seen between two successive updates - Stateless or State-full (the item-owners know
which nodes holds copies of the item)
28Summary Design parameters and performance (CAN)
Path-length Neighbor state Total path latency Per-hop latency volume Multiple routes replicas
Dimensions (d) O(dn1/d) O(d) ? - - ? -
Realities (r) ? O(r) ? - O(r) ? O(r)
MAXPEERS (p) O(1/p) O(p) ? ? O(p) ? O(p)
Hash functions (k) - - ? - ?(k) - O(k)
RTT-weighted routing - - ? ? - - -
Uniform partitioning heuristic Reduced variance Reduces variance - - Reduced variance - -
Only on replicated data
29CHORD Failures
- Replication
- Each node maintain a successor list of its r
nearest successors - Upon failure, use the next successor in the list
- Modify stabilize to fix the list
Other nodes may attempt to send requests through
the failed node Use alternate nodes found in the
routing table of preceding nodes or in the
successor list
30CHORD Failures
- Theorem If we use a successor list with r
?(logN) in an initially stable network and then
every node fails with probability 1/2, then - with high probability, find_successor returns
the closest living successor - the expected time to execute find_successor in
the failed network is O(logN)
A lookup fails, if all r nodes in the successor
list fail. All fail with probability 2-r
(independent failures) 1/N
31CHORD Replication
Store replicas of a key at the k nodes succeeding
the key Successor list helps to keep the number
of replicas per item known Other approach store
a copy per region
32BATON Failures
There is routing redundancy
- Upon node departure or failure, the parent can
reconstruct the entries - Assume node x fails, any detected failures of x
are reported to its parent y - y regenerates the routing tables of x Theorem 2
- Messages are routed
- Sideways (redundancy similar to CHORD)
- Up-down (can find its parent through its
neighbors)
33Replication - Beehive
- Proactive model-driven replication
- Passive (demand-driven) replication such as
caching objects along a lookup path - Hint for BATON
- Beehive
- The length of the average query path reduced by
one when an object is proactively replicated at
all nodes logically preceding that node on all
query paths - BATON
- Range queries
- Many paths to data
Any ideas?
34Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
35Replication Theory Replica Allocation Policies
in Unstructured P2P Systems
E. Cohen and S. Shenker, Replication Strategies
in Unstructured Peer-to-Peer Networks. SIGCOMM
2002 Q. Lv et al, Search and Replication in
Unstructured Peer-to-Peer Networks, ICS02
Replication Part
36Replication Allocation Scheme
Question how to use replication to improve
search efficiency in unstructured networks?
How many copies of each object so that the
search overhead for the object is minimized,
assuming that the total amount of storage for
objects in the network is fixed
37Replication Theory - Model
Assume m objects and n nodes Each node capacity
?, total capacity R n ? How to allocate R
among the m objects? Determine ri number of
copies (distinct nodes) that hold a copy of i S
i1, m ri R (R total capacity) Also, pi ri/R
Fraction of total capacity allocated to
I Allocation represented by the vector (p1, p2,
. pm) (r1/R, r2/R, rm/R)
38Replication Theory - Model
Assume that object i is requested with relative
rates qi, we normalize it by setting S i1, m qi
1 For convenience, assume 1 ltlt ri ? n and that
q1 ? q2 ? ? qm
Map the query distribution q to an allocation
vector p
39Replication Theory - Model
Assume all nodes equal capacity ?, ? R/n
R ? m (at least one copy per item) m gt ? (else,
the problem is trivial, maintain copies of all
items everywhere)
Bounds for pi At least one copy, ri ? 1, Lower
value l 1/R At most n copies, ri ? n, Upper
value, u n/R
40Replication Theory
Assume that searches go on until a copy is
found We want to determine ri that minimizes the
average search size (number of nodes probed) to
locate an item i Need to compute average search
size per item Searches consist of randomly
probing sites until the desired object is found
search at each step draws a node uniformly at
random and asks whether it has a copy
41Search Example
4 probes
42Replication Theory
The probability Pr(k) that the object I is found
at the kth probe is given Pr(k) Pr(not
found in the previous k-1 probes) Pr(found in one
(the kth) probe) (1 ri/n)k-1 ri/n k
(search size step at which the item is found) is
a random variable with geometric distribution and
? ri/n gt expectation n/ri
43Replication Theory
Ai Expectation (average search size) for object
i is the inverse of the fraction of sites that
have replicas of the object Ai n/ri The
average search size A of all the objects (average
number of nodes probed per object query) A Si
qi Ai n Si qi/ri
Minimize A n Si qi/ri
44Replication Theory
If we have no limit on ri, replicate everything
everywhere Then, the average search size Ai
n/ri 1 Search becomes trivial
Assume a limit on R and that the average number
of replicas per site ? R/n is fixed
How to allocate these R replicas among the m
objects how many replicas per object
45Replication Theory
Minimize Si qi/pi Subject to Spi 1 and l ? pi
? u
Monotonicity Since q1 ? q2 ? ? qm, we must
have p1 ? p2 ? ? pm More copies to more
popular, but how many?
46Uniform Replication
Create the same number of replicas for each
object ri R/m Average search size for uniform
replication Ai n/ri m/? Auniform Si qi m/?
m/? (m n/R) Which is independent of the query
distribution
47Proportional Replication
It makes sense to allocate more copies to objects
that are frequently queried, this should reduce
the search size for the more popular objects
Create a number of replicas for each object
proportional to the query rate ri R qi
48Proportional Replication
Number of replicas for each object ri R
qi Average search size for uniform
replication Ai n/ri n/R qi Aproportioanl Si
qi n/R qi m/? Auniform again independent of
the query distribution Why? Objects whose query
rate are greater than average (gt1/m) do better
with proportional, and the other do better with
uniform The weighted average balances out to be
the same
49Uniform and Proportional Replication
- Summary
- Uniform Allocation pi 1/m
- Simple, resources are divided equally
- Proportional Allocation pi qi
- Fair, resources per item proportional to demand
- Reflects current P2P practices
50Space of Possible Allocations
So what is the optimal way to allocate replicas
so that A is minimized?
- q i1/q i ? p i1/p i
- As the query rate decreases, how much does the
ratio of allocated replicas behave - Reasonable
- p i1/p i ? 1
- 1 for uniform
51Space of Possible Allocations
- Definition Allocation p1, p2, p3,, pm is
in-between Uniform and Proportional if
- for 1lt i ltm, q i1/q i lt p i1/p i lt 1
- (1 for uniform, for proportial, we want to
favor popular but not too much) - Theorem1 All (strictly) in-between strategies
are (strictly) better than Uniform and
Proportional
Theorem2 p is worse than Uniform/Proportional
if for all i, p i1/p i gt 1 (more popular gets
less) OR for all i, q i1/q i gt p i1/p i (less
popular gets less than fair share)
Proportional and Uniform are the worst
reasonable strategies
52Space of allocations on 2 items
Uniform
Proportional
p2/p1
q2/q1
53So, what is the best strategy?
54Square-Root Replication
Find ri that minimizes A, A Si qi Ai n Si
qi/ri This is done for ri ? vqi where ? R/Si
vqi Then the average search size is Aoptimal
1/? (Si vqi)2
55How much can we gain by using SR ?
Zipf-like query rates
Auniform/ASR
56Other Metrics Discussion
- Utilization rate, the rate of requests that a
replica of an object i receives - Ui R qi/ri
- For uniform replication,
- all objects have the same average search size,
- but replicas have utilization rates proportional
to their query rates - Proportional replication achieves perfect load
balancing with all replicas having the same
utilization rate, - but average search sizes vary with more popular
objects having smaller average search sizes than
less popular ones
57Replication Summary
58Pareto Distribution (for the queries)
59Pareto Distribution (for the queries)
Both model Power-law distributions Zipf what is
the size (popularity) of the r-th ranked -- y
r-b Pareto how many have size gt r (look at
the frequency distribution) PX gt x x-k PX
x x-(k1) x-a "The r-th hottest item has n
queries" is equivalent to saying "r items have n
or more queries". This is exactly the definition
of the Pareto distribution, except the x and y
axes are flipped. Whereas for Zipf, we have r
(rank) and compute n, in Pareto we have n and
compute r (rank) Reference http//www.hpl.hp.com
/research/idl/papers/ranking/ranking.html
60Replication (summary)
Each object i is replicated on ri nodes and the
total number of objects stored is R, that is S
i1, m ri R
- Uniform All objects are replicated at the same
number of nodes - ri R/m
- (2) Proportional The replication of an object is
proportional to the query probability of the
object - ri ? qi
- (3) Square-root The replication of an object i
is proportional to the square root of its query
probability qi - ri ? vqi
61Assumption that there is at least one copy per
object
- Query is soluble if there are sufficiently many
copies of the item. - Query is insoluble if item is rare or non
existent.
- What is the search size of a query ?
- Soluble queries number of probes until answer is
found. - Insoluble queries maximum search size
62- SR is best for soluble queries
- Uniform minimizes cost of insoluble queries
What is the optimal strategy?
63104 items, Zipf-like w1.5
All Soluble
85 Soluble
All Insoluble
Uniform
SR
64We now know what we need.
How do we get there?
65Replication Algorithms
- Uniform and Proportional are easy
- Uniform When item is created, replicate its key
in a fixed number of hosts. - Proportional for each query, replicate the key
in a fixed number of hosts (need to know or
estimate the query rate)
Desired properties of algorithm
- Fully distributed where peers communicate through
random probes minimal bookkeeping and no more
communication than what is needed for search. - Converge to/obtain SR allocation when query rates
remain steady.
66Replication - Implementation
Two strategies are popular Owner
Replication When a search is successful, the
object is stored at the requestor node only (used
in Gnutella) Path Replication When a search
succeeds, the object is stored at all nodes along
the path from the requestor node to the provider
node (used in Freenet) Following the reverse path
back to the requestor
67Achieving Square-Root Replication
- How can we achieve square-root replication in
practice? - Assume that each query keeps track of the search
size - Each time a query is finished the object is
copied to a number of sites proportional to the
number of probes - On average object i will be replicated on c n/ri
times each time a query is issued (for some
constant c) - It can be shown that this gives square root
68Replication - Conclusion
Thus, for Square-root replication an object
should be replicated at a number of nodes that
is proportional to the number of probes that the
search required
69Replication - Implementation
If a p2p system uses k-walkers, the number of
nodes between the requestor and the provider node
is 1/k of the total nodes visited (number of
probes) Then, path replication should result in
square-root replication Problem Tends to
replicate nodes that are topologically along the
same path
70Replication - Implementation
Random Replication When a search succeeds, we
count the number of nodes on the path between the
requestor and the provider Say p Then, randomly
pick p of the nodes that the k walkers visited to
replicate the object Harder to implement
71Achieving Square-Root Replication
What about replica deletion? Steady state
creation time equal with the deletion time The
lifetime of replicas must be independent of
object identity or query rate FIFO or random
deletions is ok LRU or LFU no
72Replication Evaluation
- Study the three replication strategies in the
Random graph network topology - Simulation Details
- Place the m distinct objects randomly into the
network - Query generator generates queries according to a
Poisson process at 5 queries/sec - Zipf-distribution of queries among the m objects
(with a 1.2) - For each query, the initiator is chosen randomly
- Then a 32-walker random walk with state keeping
and checking every 4 steps - Each sites stores at most objAllow (40) objects
- Random Deletion
- Warm-up period of 10,000 secs
- Snapshots every 2,000 query chunks
73Replication Evaluation
- For each replication strategy
- What kind of replication ratio distribution does
the strategy generate? - What is the average number of messages per node
in a system using the strategy - What is the distribution of number of hops in a
system using the strategy
74Evaluation Replication Ratio
Both path and random replication generates
replication ratios quite close to square-root of
query rates
75Evaluation Messages
Path replication and random replication reduces
the overall message traffic by a factor of 3 to 4
76Evaluation Hops
Much of the traffic reduction comes from reducing
the number of hops
Path and random, better than owner For example,
queries that finish with 4 hops, 71 owner, 86
path, 91 random
77Summary
- Random Search/replication Model probes to
random hosts - Proportional allocation current practice
- Uniform allocation best for insoluble queries
- Soluble queries
- Proportional and Uniform allocations are two
extremes with same average performance - Square-Root allocation minimizes Average Search
Size - OPT (all queries) lies between SR and Uniform
- SR/OPT allocation can be realized by simple
algorithms.
78Discussion
Cohen et al paper Path replication overshoots or
undershoot the fixed point if queries arrive in
large bursts or time between search and
subsequent copy generator is large more
involved algorithms than path replication Extensi
ons for variable size issues or nodes with
heterogeneous capacities Many issues Other
types of graphs, adaptability, etc
79Agenda ??a s?µe?a
1. ?e????af? t?? e??as??? t?? µa??µat?? 2. Ge????
??a Replication 3. Replication Theory for
Unstructured (Cohen et al paper) 4. Epidemic
Algorithms for Updates (Demers et al paper)
80Replication Unstructured P2Pepidemic
algorithms
81- Replication Policy
- How many copies
- Where (owner, path, random path)
- Update Policy
- Synchronous vs Asynchronous
- Master Copy
82Methods for spreading updates Push originate
from the site where the update appeared To reach
the sites that hold copies Pull the sites
holding copies contact the master site Expiration
times Epidemics for spreading updates
83A. Demers et al, Epidemic Algorithms for
Replicated Database Maintenance, SOSP 87
Update at a single site Randomized algorithms
for distributing updates and driving replicas
towards consistency Ensure that the effect of
every update is eventually reflected to all
replicas Sites become fully consistent only when
all updating activity has stopped and the system
has become quiescent Analogous to epidemics
84Methods for spreading updates Direct mail each
new update is immediately mailed from its
originating site to all other sites () Timely
reasonably efficient (-) Not all sites know all
other sites (stateless) (-) Mails may be
lost Anti-entropy every site regularly chooses
another site at random and by exchanging content
resolves any differences between them ()
Extremely reliable but requires exchanging
content and resolving updates (-) Propagates
updates much more slowly than direct mail
85- Methods for spreading updates
- Rumor mongering
- Sites are initially ignorant when a site
receives a new update it becomes a hot rumor - While a site holds a hot rumor, it periodically
chooses another site at random and ensures that
the other site has seen the update - When a site has tried to share a hot rumor with
too many sites that have already seen it, the
site stops treating the rumor as hot and retains
the update without propagating it further - Rumor cycles can be more frequent that
anti-entropy cycles, because they require fewer
resources at each site, but there is a chance
that an update will not reach all sites
86- Anti-entropy and rumor spreading are examples of
epidemic algorithms - Three types of sites
- Infective A site that holds an update that is
willing to share is hold - Susceptible A site that has not yet received an
update - Removed A site that has received an update but
is no longer willing to share - Anti-entropy simple epidemic where all sites are
always either infective or susceptible
87A set S of n sites, each storing a copy of a
database The database copy at site s ? S is a
time varying partial function s.ValueOf K ?
uV x t T set of keys set of values
set of timestamps (totally ordered by lt V
contains the element NIL s.ValueOfk NIL, t
item with k has been deleted from the
database Assume, just one item s.ValueOf ? uV
x tT thus, an ordered pair consisting of a
value and a timestamp The first component may be
NIL indicating that the item was deleted by the
time indicated by the second component
88- The goal of the update distribution process is to
drive the system towards - s, s ?S s.ValueOf s.ValueOf
- Operation invoked to update the database
- UpdateuV s.ValueOf r, Now)
89Direct Mail
At the site s where an update occurs For each
s ? S PostMailtos, msg(Update, s.ValueOf)
s originator of the update s receiver of the
update
Each site s receiving the update message
(Update, (u, t)) If s.ValueOf.t lt t
s.ValueOf ? (u, t)
- The complete set S must be known to s (stateful
server) - PostMail messages are queued so that the server
is not delayed (asynchronous), but may fail when
queues overflow or their destination are
inaccessible for a long time - n (number of sites) messages per update
- traffic proportional to n and the average
distance between sites
90Anti-Entropy
At each site s periodically execute For some s
? S ResolveDifferences, s
s pushes its value to s
s ? s
Three ways to execute ResolveDifference Push
(sender (server) - driven) If s.Valueof.t gt
s.Valueof.t s.ValueOf ? s.ValueOf Pull
(receiver (client) driven) If s.Valueof.t lt
s.Valueof.t s.ValueOf ? s.ValueOf Push-Pull
s.Valueof.t gt s.Valueof.t ? s.ValueOf ?
s.ValueOf s.Valueof.t lt s.Valueof.t ? s.ValueOf
? s.ValueOf
s pulls s and gets s value
91Anti-Entropy
- Assume that
- Site s is chosen uniformly at random from the
set S - Each site executes the anti-entropy algorithm
once per period
- It can be proved that
- An update will eventually infect the entire
population - Starting from a single affected site, this can
be achieved in time proportional to the log of
the population size
92Anti-Entropy
Let pi be the probability of a site remaining
susceptible (has not received the update) after
the i cycle of anti-entropy For pull, A site
remains susceptible after the i1 cycle, if (a)
it was susceptible after the i cycle and (b) it
contacted a susceptible site in the i1
cycle pi1 (pi)2 For push, A site remains
susceptible after the i1 cycle, if (a) it was
susceptible after the i cycle and (b) no
infectious site choose to contact in the i1
cycle pi1 pi (1 1/n)n(1-pi)
1 1/n (site is not contacted by a node) n(1-pi)
number of infectious nodes at cycle i
Pull is preferable than push
93Anti-Entropy
More next week