Title: Reliable, Atomic and Causal Broadcast
1Chapter 4
- Reliable, Atomic and Causal Broadcast
- Presented By Kiran Simon
2Broadcasting 3 Properties of
interest Reliability, Consistent ordering and
Causality preservation. Reliability property
Requires that a broadcast message be received
by all operational nodes. Consistent ordering
property Requires that different messages sent
by different nodes be delivered to all the
nodes Causality preservation Requires that
order in which the messages are delivered are is
consistent with the causality Between the send
events of these messages. These 3 properties
bring in 3 different broadcast primitives 1.
Reliable Broadcast, 2. Atomic Broadcast and 3.
Causal Broadcast.
3Reliable Broadcast supports reliability
only. Atomic Broadcast , in addition to
reliability supports ordering. Causal Broadcast
. Ensures that the order in which messages are
delivereed are consistent with the
causal ordering of these messages. Reliable
Broadcast using Message Forwarding. Should
ensure that all the nodes get the message even if
the sender node fails after Sending message to
some of the nodes. A tree (logical not physical)
is used to make sure that the message will be
reached to all Nodes. The root being the
sender. Assumptions If a node fails, then we
assume that all other nodes find out about that
in a finite Time. Also we assume that each node
has a copy of FAILED (set of all the nodes
failed)
4Over view Works on the basis of concept of
Succesors. The succesors gets the message from
its predecessors and sends too their
successors. A node i. On receiving a message
sends an ackoledgemnt to the sender. If the node
does not send an acknoledgment and if the status
of the node I is Failed then , it is considered
as failed and the sender takes the responsibility
of node i If the root fails after sending some
messages to some nodes, then some other node
which has the message will have to finish the
task. All the nodes except root execute the same
protocol. Each node maintain in them sendto,
ackfrom and ackto. Represeents the nodes to
which the message must be send, the nodes from
which the acknowledgments are expected and set of
nodes to which acknowledgment has to be sent.
5An Approach by piggybacking acknowledgments Uses
the Trans Protocol for reliable broadcasting. It
uses positive and negative acknowledgments on
messages which are being broadcast by nodes. The
basic idea is to piggy back acknowledgments and
negative acknowledgments On a broadcast message.
To support the protocol, each node maintains
an ack-list, a nack-list, a received list and a
Pending Retransmission list. Ack list- message
identifiers of the messages for which the node
has to send an acknowledgments Nack-list-
message identifiers of the messages for which the
node has to send negative acknowledgments
Received List-Messages that this code has
received or sent recently and which has to be
retransmitted PR-list Contains message
identifiers of messages whose retransmission has
been requested by some node
6Whenever a new message has to send a new message,
1Append ack-list to m 2.Append nack-list to
m 3.Broadcast m If a node doesnt get a
positive acknowledgment of a message for a long
time, it adds the message to the PR-list. On
receiving a message, it is saved in received list
and its id is added to the ack-list. If the
message is in the nack-list it is deleted from
there. Also if it is in PR list, then it is
deleted too. Example A Ba Cb Dc Ecd Cb Fec In
this case only c is a negative acknowledgement.
The example below is an example of message
transmission where missed messages are detected
transitively Example A Ba Cb Ecd Cb Fbec Ba
Gfb
7Atomic Broadcast It requires that in addition
to reliability, different messages must be
delivered in all the nodes at the same
order. Unlike reliable broadcast where the
message after receiving assumption was made that
it is delivered to the higher layers, in atomic
broadcasting it has to be ensured that the
messages are delivered in the correct
order. Extention of Trans protocol ( to satisfy
ordering property) Here the negative and
positive acknowledgements are appended to the
messages itself. We define observable Predicate
for delivery OPD(P,A,C) where P is a node, A and
C are messages. We denote that the sender of a
message A by P A . If OPD(P,A,C) is true, it
states that the node P is certain that PC has
received and acknowledged the message A at the
time of broadcasting of C The predicate is true
if and only if from the sequence of all the
messages received, by deleting some of those
messages, P can from a sequence of all the
messages received, by deleting some of the
messages P can form a sequence Sm of messages.
8Example For a sequence of messages transmitted by
4 different processors. B1 D1 A1d1 C1d1b1a1
D2a1c1 D1 C2d2d1 B2a1c2 The negative
acknowledgements and acknowledgements of the
messages can be represented as a graph below
B2
C2
D2
D1
C1
B1
A1
9The dashed lines are for negative
acknowledgements and the solid lines for positive
acknowledgements. D2 implicitly acknowledges D1
as both come from the same node. Eventually all
nodes will have the same global graph as the
retransmission involves the same exact original
message. OPD(P,A,C) represents that there is a
path from C to A in the graph formed by the
messages received by P and there is no negative
acknowledgement edge from C to any node ni the
path from C to A. In the partial order, if C
follows A , it implies that C acknowledges the
message A and also all the messages that A
acknowledges. The partial order graph for the
sequence is as shown
10B2
c2
D2
D1
C1
B1
A1
11A Centralized method In this method consistent
ordering of messages is guaranteed by
conceptually funneling each message through
cenralised message exchange. If multiple nodes
broadcast there is no surety that the messages
reach their destination in a specific order. So
the messages are send through a centralised
message exchange. To ensure that the exchange
doesnt fail , it is rotated b/w different nodes.
The senders actually send the messages through
the message exchange. Instead , a sender node
directly transmits the message and on receiving ,
nodes save them in a buffer queue. A global
sequence number s generated by the token site
(which is one of the token nodes) and transmitted
to all nodes for acknowledgement. The token site
is rotated among a set of nodes called the token
list.
122 phases for protocol A normal phase and
Reformation phase -normal phase- has
activities which takes place when no failure
occurs -Reformation phase- Goes into reformation
phase when some nodes fail. Normal
Phase Each node I has the following
information. Mij The sequence number of the
next broadcast message it expects from a node j.
A missing message can be detected when a message
with a sequence number higher than expected
comes. gseqi The next global sequence number
it expects. Same as above.
133 activities normally takes place in the normal
phase -- Transmitting -- Assigning Global
sequence number -- Committing Transmitting The
sender node keeps on transmitting the message
till it gets ack from the token site.
Assigning Global sequence number The token site
acknowledges messages broadcast by nodes. The ACK
can be processed by the node only if seqgseq and
the corresponding message is in Qb. When it is
processes gseq is incremented . If seqltgseq
then it is a duplicate message and if seqgtgseq
then there are some missed messages.
14Committing When atleast L1 token sites have
succesfully received the broadcast message and
the token sites successfully transferred L times
, message is said to be committed. The
committed messages are delivered by nodes in the
order of their global sequence numbers.
Reformation Phase Entered when a failure is
detected. The reformation process redefines the
token list. Any site that detects the failure
initiates the reformation and is called
originator There will be different token lists
at different times and so a version number is
given to the token list. A new token list will
always have a higher version number than the
older one. There will be only one valid token
list
15The list formed becomes a valid token list only
when it satisfies The majority test and the
sequence test. The majority test requires that
a valid list has a majority of the nodes. Thus
we can ensure that there is only one valid list
at a time. The sequence test ensures that a
site joins a list with higher version number
Than it belonged to before. Also the protocol
ensures that none of the messages that was
committed with the old list Are lost. This is
done by resiliency test. Reformation protocol,
a 3 phase protocol. Phase 1 The originator
forms a new list. Phase 2 The new list is formed
, which consists of all the nodes which have
responded. The majority and resiliency test are
applied to the new list. Phase 3 The originator
generates a new token and passes it to the new
token site Which it accepts and starts
acknowledging the message and reformation
process.
16The Three Phase Protocol Assignes priorities to
all the messages , the message with the lower
priority is delivered first. Should make sure
that no messages with a lower priority reach the
nodes later. For this the nodes explicitly agree
to a priority of messages. When
broadcasting the node assigns a priority to the
message. Also a message has a tag
deliverable and undeliverable.
Working The sender broadcasts. the
message. The receiver gets the message and
keep it is the queue , tags them as
undeliverable and assigns a priority which is one
greater than the highest priority of all the
messages in the queue.
The priority is send to the sender
by all the nodes and then the sender
17 Sets the highest priority as the global
priority and send that to all the receiving
nodes. The reciever changes the priority
of the messages to the new priority. The message
is Tagged deliverable. The queue is sorted and
the messages with the lowest priority is
delev Ered until a message with tag undeliverable
is encountered. Failures The failure of a
node wont cause any problem. If the sender node
failes before the message is not delivered, then
the node with Message tagged undeliverable acts
as the sender and the coordinator. Also a
separate garbage collection scheme will be also
required.
18Using Synchronized clocks Uses clocks to
implement the ordering. Here only fail stop
failures are taken into account. Node failures
and link failures are not Taken care of. (
Assumed it doesnt happen) Also assumes that n/w
delay is bounded. Delta time at most for a
message m to reach for Node a to b. Worst
case message delay is D. (depends upon
delta) The clocks of 2 nodes may at most differ
by beta. So the time between the 2 nodes may
atmost be tDbeta. (say X Dbeta) So the time
at the new node may be tX. Working Put the
timestamp on the message by the sender and also
the node id. The Node sends to the neighbours.
If an intermediate node gets the message it sends
to all Outgoing links. The schedule ends by
time tX.
19 Also the messages are kept in the History of
node. In the forwarding part, if the clock
time of the intermediate node is greater Than
the t X, the schedule is ended. Also in
the reciever node if the message it recieves is
part of its history , then Also the message is
discarded. The messages are delivered in the
order of the timestamp.
20A Protocol for CSMA/CD Networks For ethernet
and like networks. Network interface does the
MAC layer protocol for CSMA/CD network. NI
responsible for all the MAC layer protocol
activities. So there are chances that while
broadcasting the nodes may miss some of Messages
and to support this reliable broadcast this
protocol is used Assumptions Number of nodes
that may miss the broadcast message is less than
the total Number of nodes. The NI can cause a
collision , any time ( even while receiving a
message) By sending a jamming signal.
Working Each node has a counter. Every
message is attached with a sequence Number which
the current value of the counter. If no collision
occurs then the Counter is incremented.
21 The whole protocol works on proper usage of
the sequence number. When a counter value is
same as the sequence number , then there Are no
missed messages. If the counter value is less
than the sequence number Then there are some
missed messages. The alive nodes partitioned
into 2. Nodes with the same counter values
Nodes with counter values less than global
sequence numbers. The node with an incorrect
counter value should be stopped. So the
missed Messages can be retransmitted. This is
done by the NI by sending a jamming signal So
that a collision will occur. ( This happens while
the message is being received and Not after the
message is received). Then a retransmission
of all the messages in the range Counter1
and Global sequence number 1 is requested.
While the retransmission takes place the counters
are not incremented
22 Causal Broadcast Required when causal
ordering of the messages are required. ( ie the
delivery of message depends on the causality of
the send event) Required for operations in
distributed data bases etc. We can say 2
requirements for this ,weaker states that
causality should be preserved
and the stronger states that both the same
ordering at all the nodes should be there
and also the causality should be
preserved.
23Causal Broadcast without total Ordering Here
no guarantee that the messages will be delivered
in the same Order at all nodes , but causality
will be preserved. To achieve this care should
be taken that the messages in the delivery Queue
of the nodes are in an order such that causality
is preserved. Working When node performs a
causal broadcast, then the message is added to
Buffer. If the node itself is one of the
destinations , then it is added to the Delivery
queue. When the message m , which is in the
buffer is transmitted to another Node, a series
of messages which precedes the message m is also
send with it. In the form of a transfer
packet..
24 When the destination node receives the packet it
process the messages in the order in which it is
present in the transfer packet . If the node is
one of the destination nodes of the message,
then it is put into the delivery queue. Other
wise kept in the buffer. Assumption Only the
n/w failure which will cause the partitioning of
the n/w will Cause failures. So the network is
assumed to be free of failures.
25Causal Broadcast with Total Ordering
Broadcasts the message to all the nodes in
the same order and also Preserve the causality
of the messages. The nodes in the system
divided into 1 primary node, n backups and The
rest simple nodes. Uses counters and sequence
numbers to disseminate the dependencies Between
the nodes. The nodes send the messages
to the primary ( PS ) to broadcast. Each node
has a counter which is used to assign the
sequence Numbers. Also each node has an array
seq , which holds the sequence Number of the
last message send by each node.
26 Also each node has a variable las-msg which is
used to identify the Duplicate messages. The
PS has an array expected which stores the seq
number of the next Message it expects from all
the nodes. When a message is send to the PS by
the node, it also sends it array seq .. ( for
common sequence numbers for all the messages for
all the nodes) Also a sequence number is
assigned to the message by the PS called the
gseq. This is for globally ordering the
messages. Working The message is send to the
PS by a node together with seq . The PS
broadcasts it after assigning gseq . If gseq is
less than or equal to The last message received
by a particular node then it is a duplicate.
27The PS receives a message from a node and it
checks the seq num with the expecting . If they
are same then it is OK Backup node When the PS
fails, then the backup node becomes the primary.
It sets the Ctr value to the gseq value of the
last message it received. Then it requests all
the nodes to resend messages with a sequence
number more than or equal to expectedj.