Title: Replication Management
1Replication Management
- Yih-Kuen Tsay
- Dept. of Information Management
- National Taiwan University
2Motivations for Replication
- Performance enhancement
- Client vs. server caching
- Server pools
- Replication of immutable vs. changing data
- Increased availability
- Server failures
- Network partition and disconnected operation
- Fault tolerance guarantee correctness in spite
of faults
3General Requirements
- Replication transparency
- Clients are not aware of multiple physical copies
(replicas) of an object. - Clients see one logical copy for each object.
- Consistency
- Servers perform operations in a way that meets
the specification of correctness.
4An Architecture forReplication Management
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
5About the Servers
- Recoverability
- State Machines
- Consist of state variables and commands
- Outputs determined by the sequence of requests
processed - Static vs. dynamic set of replica managers
- Dynamic servers may crash new ones may join
- Static crashed servers are considered to cease
operating (possibly for an indefinite period)
6Phases of Request Processing
- Issuance
- unicast or multicast (from the front end to
replica managers) - Coordination (to ensure consistency)
- FIFO ordering, causal ordering, total ordering,
- Execution (maybe tentatively)
- Agreement (to commit or abort)
- Response
- From one replica manager or several replica
managers to the front end - The ordering of the phases varies for different
systems.
7Services for Process Groups
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
8View-Synchronous Group Communications
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
9Correctness Criteria
- Linearizability
- Sequential consistency
- Consider individual operations (instead of
transactions).
10Linearizability
- The interleaved sequence of operations meets the
specification of a single correct copy of the
objects. - The order of operations in the interleaving is
consistent with the real times at which the
operations occurred in the actual execution.
11Sequential Consistency
- The one-copy semantics of the replicated objects
is respected. - The order of operations is preserved for each
client, i.e., consistent with the program order
for each client. - Every linearizable service is also sequentially
consistent.
12The Primary-Backup (Passive) Model
Consistency is easily guaranteed if the replica
managers are organized as a group and the primary
uses view-synchronous group communication to send
updates.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
13Active Replication
Each front end sends its requests one at a time
to all replica managers using a totally ordered
multicast primitive, ensuring that all requests
are processed in the same order at all replica
managers.
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
14The Gossip Architecture
- A framework for providing high availability of
service through lazy replication - A request normally executed at one replica
- Replicas updated by lazy exchange of gossip
messages (containing most recent updates).
15Operations in a Gossip Service
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
16Timestamps
- Each front end keeps a vector timestamp
reflecting the latest version accessed. - The timestamp is attached to every request sent
to a replica. - Two front ends may exchange messages directly
these messages also carry timestamps. - The merging of timestamps is done as usual.
17Timestamps (cont.)
- Each replica keeps a replica timestamp
representing those updates it has received. - It also keeps a value timestamp, reflecting the
updates in the replicated value. - The replica timestamp is attached to the reply to
an update, while the value timestamp is attached
to the reply to a query.
18Timestamp Propagations
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
19The Update Log
- Every update, when received by a replica, is
recorded in the update log of the replica. - Two reasons for keeping a log
- The update cannot be applied yet it is held
back. - It is uncertain if the update has been received
by all replicas. - The entries are sorted by timestamps.
20The Executed Operation Table
- The same update may arrive at a replica from a
front end and in a gossip message from another
replica. - To prevent an update from being applied twice,
the replica keeps a list of identifiers of the
updates that have been applied so far.
21A Gossip Replica Manager
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
22Processing Query Requests
- A query request q carries a timestamp q.prev,
reflecting the latest version of the value that
the front end has seen. - Request q can be applied (i.e., it is stable) if
q.prev ? valueTS (the value timestamp of the
replica that received q). - Once q is applied, the replica returns the
current valueTS along with the reply.
23Processing Update Requests
- For an update u (not a duplicate), replica i
- increments the i-th element of its replica
timestamp replicaTS by one, - adds an entry to the log with a timestamp ts
derived from u.prev by replacing the i-th element
with that of replicaTS, and - return ts to the front end immediately.
- When the stability condition u.prev ? valueTS
holds, update u is applied and its ts is merged
with valueTS.
24Processing Gossip Messages
- For every gossip message received, a replica does
the following - Merge the arriving log with its own duplicated
updates are discarded. - Apply updates that have become stable.
- A gossip message need not contain the entire log,
if it is certain that some of the updates have
been seen by the receiving replica.
25Updates in Bayou
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
26About Bayou
- Consistency guarantees
- Merging of updates
- Dependency checks
- Merge procedures
27Coda vs. AFS
- More general replication
- Greater tolerance toward server crashes
- Allowing disconnected operations
28Transactions with Replicated Data
- A replicated transactional service should appear
the same as one without replicated data. - The effects of transactions performed by various
clients on replicated data are the same as if
they had been performed one at a time on single
data items this property is called one-copy
serializability.
29Transactions withReplicated Data (cont.)
- Failures should be serialized with respect to
transactions. - Any failure observed by a transaction must appear
to have happened before the transaction started.
30Schemes for One-Copy Serializability
- Read one/write all
- Available copies replication
- Schemes that also tolerate network partitioning
- available copies with validation
- quorum consensus
- virtual partition
31Transactions on Replicated Data
Source Instructors guide for G. Coulouris et
al., Distributed Systems Concepts and Design,
Fourth Edition.
32Available Copies Replication
- A client's read request on a logical data item
may be performed by any available replica, but a
client's update request must be performed by all
available replicas. - A local validation procedure is required to
ensure that any failure or recovery does not
appear to happen during the progress of a
transaction.
33Available Copies Replication (cont.)
Source Instructors guide for G. Coulouris et
al., Distributed Systems Concepts and Design,
Fourth Edition.
34Network Partition
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
35Available Copies with Validation
- The available copies algorithm is applied within
each partition. - When a partition is repaired, the possibly
conflicting transactions that took place in the
separate partitions are validated. - If the validation fails, some of the transactions
have to be aborted.
36Quorum Consensus Methods
- One way to ensure consistency across different
partitions is to make a rule that operations can
only be carried out within one of the partitions. - A quorum is a subgroup of replicas whose size
gives it the right to execute operations. - Version numbers or timestamps may be used to
determine whether copies of the data item are up
to date.
37An Example for Quorum Consensus
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
38Two Network Partitions
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
39Virtual Partition
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
40Overlapping Virtual Partitions
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.
41Creating Virtual Partitions
Source Coulouris et al., Distributed Systems
Concepts and Design, Fourth Edition.