Replicated Distributed Systems

About This Presentation

Title:

Replicated Distributed Systems

Description:

By Eric C. Cooper. Overview. Introduction and Background ... Designed by Eric C. Cooper ... Eric C. Cooper's new approach: Replication on per-module basis ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 68

Provided by: ali78

Category:

more less

Transcript and Presenter's Notes

Title: Replicated Distributed Systems

1
Replicated Distributed Systems

By Eric C. Cooper

2
Overview

Introduction and Background (Queenie)
A Model of Replicated Distributed Programs
Implementing Distributed Modules and Threads
Implementing Replicated Procedure Calls (Alix)
Performance Analysis
Concurrency Control (Joanne)
Binding Agents
Troupe Reconfiguration

3
Background

Present a new software architecture for
fault-tolerant distributed programs
Designed by Eric C. Cooper
A co-founder of FORE systems a leader supplier
of networks for enterprise and service providers

4
Introduction

Goal address the problem of constructing highly
available distributed programs
Tolerate crashes of the underlying hardware
automatically
Continues to operate despite failure of its
components
First approach replicate each components of the
system
By von Neumann (1955)
Drawback costly - use reliable hardware
everywhere

5
Introduction (contd)

Eric C. Coopers new approach
Replication on per-module basis
Flexible not burdening the programmer
Provide location and replication transparency to
programmer
Fundamental mechanism
Troupes a replicated module
Troupe members - replicas
Replicated procedure call (many-to-many
communication between troupes)

6
Introduction (contd)

Important Properties give this mechanism
flexibility and power
individual members of a troupe do not communicate
among themselves
unaware of one anothers existence
each troupe member behaves as no replicas

7
A Model of Replicated Distributed Programs (contd)
A model of replicated distributed program
Replicated Distributed Program
State information
module
Troupe
Procedure
8
A Model of Replicated Distributed Programs (contd)

Module
Package the procedure and state information which
is needed to implement a particular abstraction
Separate the interface to that abstraction from
its implementation
Express the static structure of a program when it
is written

9
A Model of Replicated Distributed Programs (contd)

Threads
A thread ID unique identifier
Particular thread runs in exactly one module at a
given time
Multiple threads may be running in the same
module concurrently

10
Implementing Distributed Modules and Threads

No machine boundaries
Provide location transparency the programmer
dont need to know the eventual configuration of
a program
Module
implemented by a server whose address space
contains the modules procedure and data
Thread
implemented by using remote procedure calls to
transfer control from server to server

11
Adding Replication

Processor and network failure of the distributed
program
Partial failures
Solution replication
Introduce replication transparency at the module
level

12
Adding Replication (contd)

Assumption troupe members execute on fail-stop
processors
If not gt complex agreement
Replication transparency in troupe model is
guaranteed by
All troupes are deterministic
(same input ? same output)

13
Troupe Consistency

When all its members are in the same state
gt A troupe is consistent
gt Its clients dont need to know that is
replicated
? Replication transparency

14
Troupe Consistency (contd)
Execution of a remote procedure call (I)
Server
Client
15
Troupe Consistency (contd)
Execution of a remote procedure call (II)
Server
Client
16
Execution of Procedure call

As a tree of procedure invocations
The invocation trees rooted at each troupe member
are identical
The server troupe make the same procedure calls
and returns with the same arguments and results
All troupes are initially consistent
? All troupes remain consistent

17
Replicated Procedure Calls

Goal allow distributed programs to be written in
the same as conventional programs for centralized
computers
Replicated procedure call is Remote procedure
call
Exactly-once execution at all troupe members

18
Circus Paired Message Protocol

Characteristics
Paired messages (e.g. call and return)
Reliably delivered
Variable length
Call sequence numbers
Based on the RPC
Use UDP, the DARPA User Datagram Protocol
Connectionless but retransmission

19
Implementing Replicated Procedure Calls

Implemented on top of the paired message layer
Two subalgorithms in the many-to-many call
One-to-many
Many-to-one
Implemented as part of the run-time system that
is linked with each users program

20
(No Transcript)
21
One-to-many calls

Client half of RPC performs a one-to-many call
Purpose is to guarantee that the procedure is
executed at each server troupe member
Same call message with the same call number
Client waits for return messages
Waits for all the return messages before
proceeding in Circus

22
(No Transcript)
23
Synchronization Point

After all the server troupe members have returned
Each client troupe member knows that all server
troupe members have performed the procedure
Each server troupe member knows that all client
troupe members have received the result

24
Many-to-one calls

Server will receive call messages from each
client troupe member
Server executes the procedure only once
Returns the results to all the client troupe
members
Two problems
Distinguishing between unrelated call messages
How many other call messages are expected?
Circus waits for all clients to send a call
message before proceeding

25
(No Transcript)
26
Many-to-many calls

A replicated procedure call is called a
many-to-many call from a client troupe to a
server troupe

27
Many-to-many steps

A call message is sent from each client troupe
member to each server troupe member.
A call message is received by each server troupe
member from each client server troupe member.
The requested procedure is run on each server
troupe member.
A return message is sent from each server troupe
member to each client troupe member.
A return message is received by each client
troupe member from each server troupe member.

28
Multicast Implementation

Dramatic difference in efficiency
Suppose m client troupe members and n server
troupe members
Point-to-point
mn messages sent
Multicast
mn messages sent

29
Waiting for messages to arrive

Troupes are assumed to be deterministic,
therefore all messages are assumed to be
identical
When should computation proceed?
As soon as the first messages arrives or only
after the entire set arrives?

30
Waiting for all messages

Able provide error detection and error correction
Inconsistencies are detected
Execution time determined by the slowest member
of each troupe
Default in Circus system

31
First-come approach

Forfeit error detection
Computation proceeds as soon as the first message
in each set arrives
Execution time is determined by the fastest
member of each troupe
Requires a simple change to the one-to-many call
protocol
Client can use call sequence number to discard
return messages from slow server troupe members

32
First-come approach

More complicated changes required in the
many-to-one call protocol
When a call message from another member arrives,
the server cannot execute the procedure again
Would violate exactly-once execution
Server must retain the return messages until all
other call messages have been received from the
client troupe members
Return messages is sent when the call is received
Execution seems instantaneous to the client

33
A better first come approach

Buffer messages at the client rather than at the
server
Server broadcasts return messages to the entire
client troupe after the first call message
A client troupe member may receive a return
message before sending the call message
Return message is retained until the client
troupe member is ready to send the call message

34
Advantages of buffering at client

Work of buffering return messages and pairing
them with call messages is placed on the client
rather than a shared server
The server can broadcast rather than
point-to-point communication
No communication is required by a slow client

35
What about error detection?

To provide error detection and still allow
computation to proceed, a watchdog scheme can be
used
Create another thread of control after the first
message is received
This thread will watch for remaining messages and
compare
If there is an inconsistency, the main
computation is aborted

36
Crashes and Partitions

Underlying message protocol uses probing and
timeouts to detect crashes
Relies on network connectivity and therefore
cannot distinguish between crashes and network
partitions
To prevent troupe members from diverging
Require that each troupe member receives majority
of expected set of messages

37
Collators

Can relax the determinism requirement by allowing
programmers to reduce a set of messages into a
single message
A collator maps a set of messages into a single
result
Collator needs enough messages to make a decision
Three kinds
Unanimous
Majority
First come

38
Performance Analysis

Experiments conducted at Berkeley during an
inter-semester break
Measured the cost of replicated procedure calls
as a function of the degree of replication
UDP and TCP echo tests used as a comparison

39
Performance Analysis

Performance of UDP, TCP and Circus
TCP echo test faster than UDP echo test
Cost of TCP connection establishment ignored
UDP test makes two alarm calls and therefore two
settimer calls
Read and Write interface to TCP more streamlined

40
(No Transcript)
41
Performance Analysis

Unreplicated Circus remote procedure call
requires almost twice the amount of time as a
simple UDP exchange
Due to extra system calls require to handle
Circus
Elaborate code to handle multi-homed machines
Some Berkeley machines had as many as 4 network
addresses
Design oversight by Berkeley, not a fundamental
problem

42
Performance Analysis

Expense of a replicated procedure call increments
linearly as the degree of replication increases
Each additional troupe member adds between 10-20
milliseconds
Smaller than the time for a UDP datagram exchange

43
(No Transcript)
44
Performance Analysis

Execution profiling tool used to analyze Circus
implementation in finer detail
6 Berkeley 4.2BSD system calls account for more
than ½ the total CPU time to perform a replicated
call
Most of the time required for a Circus replicated
procedure call is spent in the simulation of
multicasting

45
(No Transcript)
46
(No Transcript)
47
Concurrency Control

Server troupe controls calls from different
clients using multiple threads
Conflicts arise when concurrent calls need to
access the same resource

48
Concurrency Control

Serialization at each troupe member
Local concurrency control algorithms
Serialization in the same order among members
Preserve troupe consistency
Need coordination between replicated procedure
calls mechanism and synchronization mechanism
gt Replicated Transactions

49
Replicated Transactions

Requirements
Serializability
Atomicity
Ensure that aborting a transaction does not
affect other concurrently executed transactions
Two-phase locking with unanimous update
Drawback too strict
Troupe Commit Protocol

50
Troupe Commit Protocol

Before a server troupe member commits (or aborts)
a transaction, it invokes the ready_to_commit
remote procedure call to the client troupe
call-back
Client troupe returns whether it agrees to commit
(or abort) the transaction
If server troupe members serialize transactions
in different order, a deadlock will occur
Detecting conflicting transactions is converted
to deadlock detection

51
An example of Troupe Commit Protocol

Two server troupe members SM1 and SM2
Two client troupes C1 and C2
C1 performs transaction T1 and C2 performs
transaction T2

52
An example of Troupe Commit Protocol

Scenario 1 T1 and T2 are serialized in the same
order, say T1 first and T2 second, on SM1 and SM2

3.commit T1
6.commit T2
53
An example of Troupe Commit Protocol

Scenario 1 (contd)
SM1 and SM2 call ready_to_commit first at C1
passing true as argument
C1 returns true to both SM1 and SM2
SM1 and SM2 commit T1
SM1 and SM2 commit T2 by repeating steps (1) (3)

54
An example of Troupe Commit Protocol

Scenario 2 T1 and T2 are serialized in the
different order, say SM1 wants to commit T1 and
SM2 wants to commit T2. If transactions are
committed, SM1 and SM2 will be inconsistent

55
An example of Troupe Commit Protocol

Scenario 2 (contd)

1.ready_to_commit (true)
1.ready_to_commit (true)
56
An example of Troupe Commit Protocol

Scenario 2 (contd)
SM1 calls ready_to_commit at C1 and SM2 calls
ready_to_commit at C2
C1 will not return any value because it is
waiting for the call-back from SM2. The same
thing happens to C2.
Without returning values from C2, SM2 cannot
commit T2 or proceed T1. Neither can SM1.
DEADLOCK! gt Different serialization orders are
detected

57
An example of Troupe Commit Protocol

Scenario 3 T1 and T2 are serialized in the
different order. However, committing them will
NOT leave SM1 and SM2 at inconsistent states
SM1 and SM2 calls two ready_to_commit at C1 and
C2 in parallel.
Both server troupe members will commit T1 and T2
in parallel after C1 and C2 return true.

58
An example of Troupe Commit Protocol

Scenario 3 (contd)

3.commit T1 and T2
59
Binding
60
(No Transcript)
61
Binding for Replicated Program

Cache Invalidation Problem
A clients binding information becomes stale
Causes
A server troupe member or an entire troupe is no
longer available
The specified interface is no longer exported
A new troupe member is added

62
Binding for Replicated Program

Cache Invalidation Detection
Paired message protocol can detect missing troupe
members
Remote procedure call level can detect
non-exported interface
Added troupe members CANNOT be detected by
clients alone gt Need help from binding agents

63
Binding for Replicated Program

How is a new troupe member added?
Assume the new member is already in the same
state as other members
The new member calls the add_troupe_member at the
binding agent
The binding agent invokes the set_troupe_id
procedure at each troupe member

64
Binding for Replicated Program

Result of adding a new troupe member
The updated troupe contain the new member
The troupe ID is changed
Client will detect this update by finding the
original server troupe ID is no longer valid

65
Binding for Replicated Program