Title: Distributed Systems
1Distributed Systems
- Topic 9 Time, Coordination and Replication
- Dr. Michael R. Lyu
- Computer Science Engineering Department
- The Chinese University
2Outline
- 1 Time
- 2 Coordination
- 3 Replication
- The gossip architecture
- The process groups approach
- 4 Summary
31 Time
- The notation of time
- story 12/9/1949
- External synchronization
- Internal synchronization
- Physical clocks and their synchronization
- Logical time and logical clocks
41.1 Synchronizing Physical Clocks
- Computer each contain their physical clock.
- Physical clock is limited by its resolution - the
period between updates of the clock register. - Clock drift often happens to physical clocks.
- To compensate for clock drifts, computers are
synchronized to a time service, e.g., UTC -
Coordinated universal time. - Several other algorithms for synchronization.
51.1 Compensating for Clock Drift
- S(t) H(t) ?(t) S application time, H
hardware clock time, ? compensating factor. - Assuming linear relation ?(t) aH(t) b.
- Let the value of the software clock be Tskew when
H h, and let the actual time be Treal. - If S is to give the actual time after N further
ticks, we have - Tskew (1 a)h b, and Treal N (1 a)(h
N) b. - a (Treal - Tskew) / N and b Tskew - (1 a)h
61.1 Cristians Clock Synchronization
- Let the time returned in Ss message mt be t. P
should set its clock to t Tround/2. - The time by Ss clock when the reply message
arrives is t min, t Tround - min, with
width Tround - 2 min and accuracy (Tround/2 -
min).
71.1 The Berkeley Algorithm
- A coordinator computer is chosen to act as the
master. Master periodically polls to slaves
whose clocks are to be synchronized. - The master estimates their local clock times by
observing the round-trip times, and it averages
the values obtained. - The master takes a fault-tolerant average.
- Should the master fail, then another can be
elected to take over.
81.1 The Network Time Protocol
- NTP distributes time information to provide
- a service to synchronize clients in Internet
- a reliable service that survives loss of
connection - a frequent resynchronization for clients clock
drift - protection against interference with time server
- NTP service is provided by various servers
- Primary servers, secondary servers, and servers
of other levels (called strata). - Synchronization subnet the servers which are
connected in a logical hierarchy.
91.1 NTP Synchronization Modes
- NTP servers synchronize in three modes
- Multicast mode
- Procedure-call mode
- Symmetric mode
101.2 Logical Time and Logical Clocks
- The order of the events
- two events occurred in the order they appear in a
process. - event of sending occurred before event of
receiving. - happened-before relation, denoted by ?
- HB1 If ? process p x ?p y, then x ? y.
- HB2 For any message m, send(m) ? rcv(m),
- HB3 If x, y and z are events such that x ? y
and y ? z, then x ? z.
111.2 Logical Timestamps Example
- Events occurring at three processes
121.2 Logical Timestamps
- Logical clock - a monotonically increasing
software counter. - Cp logical clock for process p Cp(a) timestamp
of event a at p C(b) timestamp of event b - LC1 event issued at process p Cp Cp 1
- LC2 a) p sends message m to q with value t
Cp - b) Cq max(Cq,t) and applies LC1 to
rcv(m). - If a ? b then C(a) lt C(b), but not visa versa!
- Total order logical clock and vector clock.
131.2 Logical Timestamps Example
- Events occurring at three processes
gt 7
2
1
4
3
5
1
142 Coordination
- Distributed processes need to coordinate their
activities. - Distributed mutual exclusion is required for
safety, liveness, and ordering properties. - Election algorithms methods for choosing a
unique process for a particular role.
152.1 Distributed Mutual Exclusion
- The basic requirements for mutual exclusion
- ME1 (safety) At most one process may execute in
the critical section (CS) at a time. - ME2 (liveness) A process requesting entry to the
CS is eventually granted. - ME3 (ordering) Entry to the CS should be
granted in happened-before order. - The central server algorithm.
- A ring-based algorithm.
- A distributed algorithm using logical clocks.
162.2 Elections
- An election is a procedure carried out to choose
a process from a group. - A ring-based election algorithm.
- The bully algorithm.
173 Replication
- Replication is the maintenance of on-line copies
of data and resources - For performance, availability, fault tolerance.
- Basic Architectural Model.
- Consistency and request ordering.
- The gossip architecture.
- The process group approach.
183 Bulletin Board Example
193 Replication Issues
- Replica management models consider trade-off
between accuracy and response time. - Simple asynchronous model
- Totally synchronous model
- Quorum-based schemes
- Causality-ordered
- Multicast updates to a process group.
- Read/write ratio.
203.1 Basic Architectural Model
213.1 The Gossip Architecture
223.1 The Primary Copy Model
233.2 Consistency and Request Ordering
- Criteria correctness vs. expenses.
- Total, causal, and sync ordering requirements.
- Implementing request ordering.
- Implementing total ordering.
- Implementing causal ordering with vector
timestamps.
243.2.1 Total, Causal, and Sync Ordering
- Let r1 and r2 be requests.
- Total ordering Either r1 is processed before r2
or r2 is processed before r1, at all RMs. - Causal ordering If r1 happened-before r2 then r1
is processed before r2 at all RMs. - FIFO ordering If r1 is issued before r2 then r1
is processed before r2 at all RMs. - Sync-ordering If r1 is sync-ordered, then either
r1 is processed before r2 at all RMs or r2 is
processed before r1 at all RMs.
253.2.1 Example 1
263.2.1 Example 2
273.2.2 Implementing Request Ordering
- Hold-back A received request is not processed by
RM until ordering constraints can be met. - Stable message all prior requests processed.
- Hold-back queue vs. delivery queue.
- Safety property no message will be delivered out
of order by being prematurely transferred. - Liveness property no message should wait on the
hold-back queue forever.
283.2.3 Implementing Total Ordering
- Basic approach assign totally ordered
identifiers to requests. - Sequencer
- Distributed agreement in assigning request ids.
293.2.4 Implementing Causal Ordering
- Vector timestamp a list of counts of update
events, one for each of the replica managers. - Merging vector timestamps choose the largest
values from the two vectors, component-wise.
e.g., FE time vector (2,3,4)
303.3 The Gossip Architecture
313.4 Process Group Approach
- Process group and group communication.
- Group structure
- peer group
- server group
- client-server group
- subscription group
- hierarchical groups
323.4 Process Group Services
- Group membership management
- Create
- Join
- Leave
- Group address expansion
- Multicast communication
- unreliable multicast
- reliable multicast
- atomic multicast
333.4 Multicast Communication
- Sample multicasting operation
- void Multicast (in orderType order, in groupId
group, in msg m, in int nReplies, out msgSeq
replies) raises () - Order types
- unordered
- total ordering
- causal ordering
- sync-ordering
344 Summary
- Timing issues
- Synchronizing physical clocks.
- Logical time and logical clocks.
- Distributed coordination and mutual exclusions.
- Replication to providing good performance, high
availability and fault tolerance. - The gossip approach and the process group
approach. - CORBA replication service is a research topic.