Title: Optimistic Consistency with Version Vector Weighted Voting
1Optimistic Consistency with Version Vector
Weighted Voting
- João Barreto
- joao.barreto_at_inesc-id.pt
- Paulo Ferreira
- paulo.ferreira_at_inesc-id.pt
- Distributed Systems Group
- GSD INESC-ID Lisbon
- http//www.gsd.inesc-id.pt/
- Summary
- Introduction
- Related Work
- Consistency Protocol
- Evaluation
- Conclusions
2Motivation
- Data replication is fundamental for most
distributed systems - Enhances performance and scalability
- Improves fault tolerance and, in particular,
availability - Replica consistency must be ensured
- Mobile and other loosely-coupled network
environments call for optimistic replication
strategies - High availability is essential
- Optimistic strategies achieve such requirement by
allowing replicas to be updated anytime and
anywhere
3Problem
- Consistency of optimistic replication strategies
is problematic - Updates may conflict if issued concurrently at
distinct replicas - Weak consistency guarantees
- Consistency protocol must ensure that replicated
system evolves - From a possibly inconsistent tentative state
- To a strongly consistent stable state
4Effective Update Commitment is Crucial
- In some scenarios, applications are willing to
temporarily work with tentative data - Disconnected operation
- Collaborative activities using mobile ad-hoc
network - However, users want their tentative updates to
rapidly become committed into a strongly
consistent state - Effective update commitment is crucial for an
useful and trustworthy optimistic replication
protocol
5Existing Approaches
- No stable value guarantees (Roam)
- Not adequate for applications with strong
consistency demands - Ack Vectors (Golding)
- Updates are committed when received at every
replica - Unavailability of any single replica stalls the
entire commitment process - Primary Commit (Bayou, Haddock-FS)
- Commitment decision taken by a single,
differentiated replica - Unavailability of such replica stalls the entire
commitment process - Epidemic Weighted Voting (Deno)
- Eliminates single point of failure of Primary
Commit scheme - One election round per committed update
6Basic Weighted Voting Simple Definition
- Concurrent tentative updates are rival candidates
in an election - Replicas act as voters with possibly
differentiated weights in the election (total
weight sums up to 1) - Each replica votes for the updates it casts
- Or is convinced by other replicas to vote for
their candidate - Voting information is propagated epidemically
between replicas - When a candidate collects a plurality of votes,
its update is committed onto the stable value - The remaining rival candidates are discarded
- A new election is then started
7Basic Weighted Voting Example
4
2
u4 (0.25 voted)
3
1
u1 (0.25 voted)
8Basic Weighted Voting Example
u1 (0.50 voted)
4
2
u4 (0.25 voted)
3
1
u1 (0.50 voted)
9Basic Weighted Voting Example
u1 (0.75 voted)
4
2
u4 (0.25 voted)
3
u1 (0.75 voted)
1
u1 (0.5 voted)
10Basic Weighted Voting Example
Commit u1
4
2
u4 (0.25 voted)
3
Commit u1
1
u1 (0.5 voted)
11Basic Weighted Voting Problem
u1
u1
u2
4
2
3
1
u1 lt u1 lt u2
u1
u1
u2
Outcome sequence of causally ordered tentative
updates
But only one committed update per election round!
12Our Solution To use version vectors as candidates
- Election candidates represented by version
vectors - That identify the version that is obtained if
that candidate wins - Instead of one update, candidates now represent a
sequence of causally ordered updates - If a prefix of a candidate is discarded, then the
whole candidate is also discarded - If a prefix wins, then the remainder carries on
as a candidate on further elections
u1
u4
Candidate represented by 1,0,0,1
13Replica State
- Replicas maintain the following state
- StableTS
- Most recent stable version that is currently
known by the replica - Votes1..N
- Candidates voted by each replica, as known by the
local replica - Log of committed updates
14Replica State
- Replicas maintain the following state
- StableTS
- Most recent stable version that is currently
known by the replica - Votes1..N
- Candidates voted by each replica, as known by the
local replica - Log of committed updates
15Replica State
- Replicas maintain the following state
- StableTS
- Most recent stable version that is currently
known by the replica - Votes1..N
- Candidates voted by each replica, as known by the
local replica - Log of committed updates
16Replica State
- Replicas maintain the following state
- StableTS
- Most recent stable version that is currently
known by the replica - Votes1..N
- Candidates voted by each replica, as known by the
local replica - Log of committed updates
17Replica State
- Replicas maintain the following state
- StableTS
- Most recent stable version that is currently
known by the replica - Votes1..N
- Candidates voted by each replica, as known by the
local replica - Log of committed updates
18Election Decision
- A candidate or a common prefix of distinct
candidates win an election when they have
collected a plurality of votes - Election decision is taken locally at each
replica - Whenever it has received enough voting information
Common prefix 1,0,0,1 has plurality of votes!
u2
u1
u1
u4
u3
u4
19Election Decision
- A candidate or a common prefix of distinct
candidates win an election when have collected a
plurality of votes - Election decision is taken locally at each
replica - Whenever it has received enough voting information
u2
u1
u1
u4
u4
20Anti-Entropy
- Update information about completed elections
Replica 3
Replica 1
21Anti-Entropy
- Update information about completed elections
- Persuade replica to vote for the same candidate
as the other replica - If Votes31 null or if Votes11 gt Votes33
Replica 3
Replica 1
22Anti-Entropy
- Update information about completed elections
- Persuade replica to vote for the same candidate
as the other replica - Update remaining votes with more up-to-date
voting information - If Votes3i null or if Votes1i gt Votes3i
Replica 3
Replica 1
23Anti-Entropy
- Update information about completed elections
- Persuade replica to vote for the same candidate
as the other replica - Update remaining votes with more up-to-date
voting information
Replica 3
Replica 1
24Version Vector Weighted Voting Example
u1
u1
u2
4
Candidate 2,1,0,0 (0.5 voted)
2
3
1
u1
u1
u2
Candidate 2,1,0,0 (0.5 voted)
25Version Vector Weighted Voting Example
u1
u1
u2
4
u1
u1
u2
2
3
Candidate 2,1,0,0 (0.75 voted)
Candidate 2,1,0,0 (0.75 votes)
1
u1
u1
u2
Candidate 2,1,0,0 (0.5 voted)
26Version Vector Weighted Voting Example
Commit
u1
u1
u2
Commit
4
u1
u1
u2
2
3
1
u1
u1
u2
Candidate 2,1,0,0 (0.5 voted)
27Evaluation Simulated Environment
- C implementation of representative commitment
protocols - Primary Commit
- Basic Weighted Voting
- Dynamic Version Vector Weighted Voting
- 10 replicas running all protocols side-by-side
- Two phases per simulation cycle for every
replica - With a given probability, issue tentative update
- Perform anti-entropy with accessible replica,
chosen randomly - Total of 70 probability that, at each cycle, one
update is issued in the entire system - Three update models tested
28Evaluation Commitment Delays with Uniform Update
Model
- Identical update probability for every replica
- 7 per replica
29Evaluation Commitment Delays with Hot-Spot
Update Model
- Three differentiated replicas with higher update
probability - 24 per replica for hot-spots
- 4 per replica for remaining
30Evaluation Commitment Delays with Token Exchange
Update Model
- Only a single up-to-date replica with higher
update probability - 54 for replica holding the token
- 4 per replica for remaining
- Token exchanged between after anti-entropy with
another replica with probability of 40
31Evaluation Update Commitment Ratio
- Depends on
- Sensitivity to replica inaccessibility
- Commitment delay
- With non-null disconnection probabilities, our
solution typically achieves the best commitment
ratios
32Conclusions
- Mobile and loosely-coupled environments demand
rapid and fault-tolerant update commitment - Our solution provides fault-tolerance by epidemic
weighted voting approach - In addition, optimizes voting decision by
allowing multiple updates to be committed at one
single election round - Experimental results show that overall commitment
ratio is higher than that of Primary Commit or
Basic Weighted Voting alternatives - Especially when replicas may become disconnected
and in non-uniform update models
33Conclusions (2)
- Two drawbacks when static version vectors
- Necessary to maintain complete knowledge of group
membership - Static version vectors impose N-entry array per
candidate - Instead of simple scalar in basic weighted voting
- Use of Dynamic Version Vectors (DVVs) eliminates
first and effectively minimizes second
34Thank You
Haddock-FS Project www.gsd.inesc-id.pt/jpbarreto
/Haddock-FS.html
Distributed Systems Group at Inesc-ID
Lisbon www.gsd.inesc-id.pt
35Dynamic Version Vector Candidates
- Two drawbacks when static version vectors
- Necessary to maintain complete knowledge of group
membership - Static version vectors impose N-entry array per
candidate - Instead of simple scalar in basic weighted voting
- Use of Dynamic Version Vectors (DVVs) eliminates
first and minimizes second - Store just the minimal set of consistency-relevant
entries - Absent entries are treated as zero-valued entries
- Causality statements equivalent to static version
vectors - Provided that comparisons involve DVVs that have
seen same set of vector compressions
36Dynamic Version Vector Candidates (2)
- Minimal set of entries comprises entries of
replicas that have one or more tentative updates
still pending - Accordingly, DVVs must be dynamically expanded
and compressed - Expansion is trivial
- When an absent replica issues an update, simply
add a new entry - Compression is typically cumbersome (Ratner98)
- But effective if incorporated into Weighted
Voting Protocol - When an update is locally committed, decrement
entry corresponding to its issuer from all the
DVVs stored locally - If entry becomes zero-valued, it may be discarded
- Correct DVV comparisons between replicas are
guaranteed by compression roll-backs, when
necessary - Using information from log of committed updates