Lightweight Causal and Atomic Group Multicast - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Lightweight Causal and Atomic Group Multicast

Description:

Processes other than token holder, that receive m push it on delay queue ... 3. Token holder periodically sends these order updates to others in a sets-order ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 31
Provided by: ccGa
Category:

less

Transcript and Presenter's Notes

Title: Lightweight Causal and Atomic Group Multicast


1
  • Lightweight Causal and Atomic Group Multicast
  • Kenneth Birman
  • André Schiper
  • Pat Stephenson

2
Overview
  • Forms of distributed applications
  • Basic system model for further discussion
  • Issues to be handled based on the model
  • Protocols as the basic framework
  • Protocol extensions
  • Protocol optimizations
  • The thank you slide

3
ISIS Toolkit
  • Tools for building software in distributed
    environments
  • Toolkit based on key concepts of groups of
    virtually synchronous processes and reliable
    multicast
  • CBCAST is the dominant protocol
  • Tools use CBCAST as a generic communication
    framework

4
Structure of Applications
  • Distributed - processes co-operate to do the job
  • Processes clubbed into groups
  • Groups of processes logically overlap in
    various ways forming patterns of communication
  • Group memberships can change
  • Practical observation most ISIS applications
    have relatively infrequent membership changes

5
Peer Groups
6
Client/Server Groups
G2
G1
G3
7
Diffusion Groups
G2
G1
8
Hierarchical Groups
G1
G3
G2
9
Protocol Extension
  • Basic causal message delivery protocol works for
    single group of processes
  • Multi-group handling added to classic causality
    protocol
  • Asynchronous communication is a key factor for
    high performance in distributed system
  • CBCAST causal orderABCAST total order

10
Basic System Model
  • P p1, p2, ... , pn with disjoint memory
  • G g1, g2, ... , gn
  • Members need not be identical
  • No theoretical limit on membership size
  • Processes only multicast to groups that they are
    members of
  • Views are defined for group maintenance
  • Simply said, group defines area of interest and
    view defines an instance of its group at a
    particular moment of observation

11
Basic System Model
  • view of a group is the list of its members
  • view sequence for g view0(g), view1(g), ... ,
    viewn(g) 1. view0(g) 02. i viewi(g) ?
    P3. viewi(g) and viewi1(g) differ by one member
  • In short, view sequence is a series of snapshots
    detailing changes in membership of g
  • Members learn about failure of other members
    through the view mechanism

12
Basic System Model
  • The transport primitives must provide lossless
    uncorrupted message delivery
  • Transport layer discards messages if a process at
    any end fails
  • A process hang (transient problem) is not
    distinguished from a permanent failure
  • ISIS uses transport built over unreliable datagram

13
Basic System Model
  • Execution of a process is a sequence of events
  • denotes dependence of events in p on one
    another
  • sendp(m), rcvp(m) and deliverp(m)
  • dests(m)
  • rcvp(m) deliverp(m)
  • Dependence based on Lamport's happened before
    relationship
  • In general, m send(m) ? rcv(m)

14
Virtual Synchrony and Delivery Atomicity
  • Synchronous system can not exploit concurrency
  • In virtually synchronous system, users can
    program as if one distributed event happens at a
    time an abstraction created chiefly due to
    message sequencing
  • Address expansion
  • Delivery atomicity deliver all or none

15
Types of delivery ordering
  • senda(m) ? sendb(m') p dests(m) n
    dests(m')deliver(m) deliver(m')That is,
    messages m and m' sent from processes a and b
    will be causally ordered in processes that belong
    to both groups where m and m' will be multicast.
    Note that rcv(m) may happen after rcv(m')This is
    the CBCAST protocol
  • m, m', p g deliverp(m, g) deliverp(m',
    g) q g deliverq(m,g) deliverq(m',
    g)That is, arbitrary messages m and m' sent to
    group g will be delivered in the same order at
    processes p and q that are members of g. This is
    total ordering.This is the ABCAST protocol
  • For now, we assume group membership is fixed

16
CBCAST
  • Assume processes in P communicate using
    broadcasts to others
  • Vector timestamps used. m sent by pk is stamped
    with VT(m). VT(m)k indicates no. of messages
    sent by pk that precede m
  • Protocol1. pi increments VT(pi)i and
    timestamps m before sending m2. On reception of
    m, pj ? pi delivery delayed until
    VT(m)k VT(pj) 1 when k i
    k 1 ... n VT(m)k
    VT(pj)k otherwise3. On delivery,
    k 1 ... n VT(pj)k max( VT(pj)k, VT(m)k)
  • That is, deliver only if my message count of a
    sender's messages is one less than what the time
    stamp is telling me AND my message count for
    every other sender is more up-to-date than what
    the time stamp is telling me.
  • Again, note the difference between reception and
    delivery

17
CBCAST Example
18
ABCAST
  • A token holder process, token(g) viewi(g).
    token(g) is the traffic cop in viewi(g)
  • For senders other than token holder,1. Sender
    CBCASTs m and marks it as undeliverable.
    Processes other than token holder, that receive m
    push it on delay queue2. Token holder delivers
    messages as they come and makes a note of the
    order3. Token holder periodically sends these
    order updates to others in a sets-order kind of
    control message4. Others simply follow the order

19
ABCAST Cost
  • Depends on origin of multicasts and frequency of
    token movement
  • If origin is repeatedly a same process, make it
    the token process useful in diffusion groups
  • In case of random origin, assuming token remains
    fixed for the period of observation, we can group
    k ABCASTS in one sets-order messageHence, 1
    (1/k) CBCASTS per ABCASTWith increasing penalty
    of delivery delay as k increases.

20
Virtually Synchronous Addressing
  • We now assume that the group membership can
    change at run-time.
  • Problem Software layers above CBCAST/ABCAST
    (protocol) will use a group identifier to
    multicast to a particular group. Protocol should
    ensure that message gets sent to all and only the
    intended members.
  • Solution Flush messages Let's say viewi(g)
    changes to viewi1(g) and pk knows about it. pk
    followed by everyone else sends flush. All
    messages get drained out. We are done.
  • Next problem n2 messages!
  • Next solution Flush coordinator just like our
    token holder from ABCAST. Now 2n messages
    required.

21
In Case of Failures
  • Problems1. Disruption of multicast transmission
    No delivery atomicity.2. We cannot assume that
    processes will respect flush protocol No
    virtual synchrony
  • Scenario1. Flush protocol for viewi(g)
    complete. viewi(g) installed2. pf fails. Flush
    protocol for viewi1(g) starts3. ps fails.
    Coordinator of flush viewi1(g) waits for ps to
    flush
  • SolutionDefer the installation of viewi1(g).
    Tailgate it on flush protocol of some viewik(g),
    k 1. Install viewi1(g) only if flush messages
    for viewik(g) received from all processes
    viewi1(g) n viewik(g)
  • In short, check if all concerned processes as of
    earlier view installation reply for installing
    the new view. If they do, we can surely install
    the earlier view. If they don't, some new view
    will override.

22
CBCAST Multiple Groups
  • pi belongs to groups ga and gb. Multicasts sent
    to ga should be distinguished from those sent to
    gb
  • pj belongs to gb only and receives a message m
    from pi timestamped VT(m), whereVT(m)i k.
  • pj was interested in only some of these k
    messages. The rest, were for gb.
  • Have multiple VT clocks VTa, VTb, ... for
    groups ga, gb, ...

23
CBCAST Multiple Groups(Protocol change)
  • Protocol (From an earlier slide)1. pi
    increments VT(pi)i and timestamps m before
    sending m2. On reception of m, from pj ? pi
    delivery delayed until
    VT(m)k VT(pj) 1 when k i k 1
    ... n VT(m)k
    VT(pj)k otherwise3. On delivery,
    k 1 ... n VT(pj)k max( VT(pj)k,
    VT(m)k)
  • Modification to step 2. in above protocol for
    multiple groups2. On reception of m, from pj ?
    pi sent in ga, delivery is delayed until2.1
    VTa(m)i VTa(pj)i 12.2 k (pk ga
    and k ? i) VTa(m)k VT(pj)k2.3 g (g
    Gj) VTg(m) VTg(pj)
  • 2.3 says,Deliver this message if I have received
    those messages from other groups that pi received
    upto just before sending me this message. The
    scope of other groups is limited to only those
    that I belong to

24
CBCAST Multiple Groups
25
VT Compression
  • In single group, its questionable but in case of
    multiple groups, there is considerable potential
    to reduce the amount of piggybacked data
  • The flush protocol resets vector counters to 0
    This is good news1. Better scope for sparse
    representation2. Can deal with vector element
    rolling over to 0
  • External group updates need not be sent on every
    multicast within a group
  • Compression Communication Locality

26
Communication Patterns
  • Substantial reduction in piggybacked timestamps
    possible with certain patterns
  • Define communication structure as directed graph
    CG (G, E) groups g1, g2 and (g1, g2) is an
    edge if there is at least one process common to
    g1 and g2
  • CG is k-bounded if no biconnected component has
    more than k nodes
  • If a group g is in a biconnected component of
    size k, processes within g need only maintain and
    transmit timestamps for other groups in this
    biconnected component

27
Communication Patterns (Dynamic Structure)
  • Conservative solution1. p can multicast to g if
    g is the only active group for p or there are no
    active groups2. Otherwise, message delivery and
    sends are artificially delayed to make the group
    inactive3. p can then send

28
Conclusion
  • Paper presents implementation of a multicast
    primitive for constructing distributed systems
  • Efficiency considerations related to group
    multicasts are presented
  • Process groups and group communication can
    achieve performance and scaling compared to that
    of raw message transport layer

29
  • Thank You!

30
Communication Patterns (Multicast Epochs)
  • Excluded group
  • p is not safe in g if1. Last message from some
    other group g' 2. g or g' is excluded
  • p maintains a local variable epochp and
    increments if p is not safe in g. epoch variable
    is piggybacked on message
  • On reception, flush initiated if epochp lt
    epoch(m)
  • All members update epoch variables to maximum of
    those floating around
Write a Comment
User Comments (0)
About PowerShow.com