Title: Reliable multicast
1Reliable multicast
- Tolerates process crashes. The additional
requirements are - Only correct processes will receive multicasts
from all correct processes in the group.
Multicasts by faulty processes will be received
either by every correct process, or by none at
all.
2A theorem on reliable multicast
-
- In an asynchronous distributed system, total
order reliable multicasts cannot be implemented
when even a single process undergoes a crash
failure. - Why? Since it will violate the FLP impossibility
result.
3Scalable Reliable Multicast
- IP multicast or application layer multicast has
to detect the loss of messages and use
retransmission for achieving reliability. For
large groups (like distance learning
applications) scalability is a major problem.
4Scalable Reliable Multicast
- Difficult to scale
- Sender state explosion
- Message implosion
Statereceiver 1, receiver 2, receiver n
5Scalable Reliable Multicast
- If omission failures are rare, then receivers
will only report the non-receipt of messages
using NACK, It only triggers selective
point-to-point retransmission. The reduction of
acknowledgements is the underlying principle of
Scalable Reliable Multicasts (SRM). - If several members of a group fail to receive a
message, then each such member waits for a random
period of time before sending its NACK. This
helps to suppress redundant NACKs. Sender
multicasts the missing copy only once.
6Dealing with open groups
- Processes may join or leave an open group. Life
will be simpler, if everyone has a consistent
view of the current membership. - (view current membership)
- What problems can arise if members do not have
identical views?
7Membership service
- A group membership service looks after the
following - Joining and leaving groups.
- Updating all members about the latest view of the
group - Failure detection
8Dealing with open groups
- Views should propagate in the same order to all.
- Example.
- Current view v0(g) 0, 1, 2, 3.
- Let 1, 2 leave and 4 join the group concurrently.
- This view change can be serialized in many ways
- 0,1,2,3, 0,1,3 0,3,4, OR
- 0,1,2,3, 0,2,3, 0,3, 0,3,4, OR
- 0,1,2,3, 0,3, 0,3,4
- Send these changes by total order multicast.
9View propagation
- Process 0
- v0(g) v0(g) 0.1,2,3,
- send m1, ...
- v1(g)
- send m2, send m3 v1(g) 0,1,3,
- v2(g)
- Process 1 v2(g) 0,3,4
- v0(g)
- send m4, send m5
- v1(g)
- send m6
- v2(g) ...
10View-synchronous communication
- With respect to each message, all correct
processes have the same view. - m sent in view V ? m received in view V
11View delivery guidelines
- If a process j joins and thereafter continues its
membership in a group g that already contains a
process i, then eventually j appears in all views
delivered by process i. - If a process j permanently leaves a group g that
contains a process i, then eventually j is
excluded from all views delivered by process i.
12View-synchronous communication
- Agreement. If a correct process k delivers a
message m in vi(g) before delivering the next
view vi1(g), then every correct process j ?
vi(g) ? vi1(g) must deliver m before delivering
vi1(g). - Integrity. If a process j delivers a view vi(g),
then vi(g) must include j. - Validity. If a process k delivers a message m in
view vi(g) and another process j ? vi(g) does not
deliver that message m, then the next view
vi1(g) delivered by k must exclude j.
13Example
- Let process 1 deliver m and then crash.
- Possibility 1. No one delivers m, but each
delivers the new view 0,2,3. - Possibility 2. Processes 0, 2, 3 deliver m and
then deliver the new view 0,2,3 - Possibility 3. Processes 2, 3 deliver m and
then deliver the new view 0,2,3 but process 0
first delivers the view 0,2,3 and then delivers
m. - Are these acceptable?
0
m
1
m
2
m
3
0,1,2,3
0,2,3
14Overview of Transis
- Group communication system developed by Danny
Dolev at the Hebrew University of Jerusalem. - Deals with open group
- Supports scalable reliable multicast
- Tolerates network partition
15Overview of Transis
- IP multicast (or ethernet LAN) used to support
high bandwidth multicast. - Acks are piggybacked and message loss is detected
transparently, leading to selective
retransmission - The sequence of messages P1, P2, p2Q1, Q2, q3R1,
received by a member i ? P,Q,R,S shows the
recipient did not receive the message Q3.
16Overview of Transis
- Causal mode (maintains causal order)
- Agreed mode (maintains total order that does not
conflict with the causal order) - Safe mode (Delivers a message only when the lower
levels of the system have acknowledged its
reception at all the destination machines. All
messages are delivered relative to a safe
message)
17Overview of Transis
Dealing with partition
Each partition assumes that the machines in the
other partition have failed, and
maintains consistency within its own partition
only.
After repair, consistency is restored in the
entire system.
18Replication
- Improves reliability
- Improves availability
- (What good is a reliable system if it is not
available?) - Replication must be transparent and create the
illusion of a single copy.
19Updating replicated data
F
F
F
Alice
Bob
Bob
Alice
Update and consistency are primary issues.
20Passive replication
- At most one replica can be the primary server
- Each client maintains a variable L (leader) that
specifies the replica to which it will send
requests. Requests are queued at the primary
server. - Backup servers ignore client requests.
primary
clients
backup
21Primary-backup protocol
- Receive. Receive the request from the client and
update the state if appropriate. - Broadcast. Broadcast an update of the state to
all other replicas. - Reply. Send a response to the client.
client
req
reply
primary
update
?
backup
22Primary-backup protocol
- If the client fails to get a response due the
crash of the primary, then the request is
retransmitted until a backup is promoted to the
primary, - Failover time is the duration when there is no
primary server.
client
req
reply
primary
?
update
backup