Title: Distributed Systems 2006
1Distributed Systems 2006
- Group Membership
- With material adapted from Ken Birman
2Plan
- (We skip Sections 15.2 and 15.3)
Robust Web Services Well build them with these
tools
Tools for solving practical replication and
availability problems well base them on ordered
multicast
Ordered multicast Well base it on
fault-tolerant multicast
Fault-tolerant multicast Well use membership
Tracking group membership Well base it on 2PC
and 3PC
2PC and 3PC Our first tools (lowest layer)
3Basic Operation
4Role of Group Membership Service
- Well add a new system service to our distributed
system, like the Internet DNS but with a new role - Its job is to track membership of groups
- To join a group a process will ask the GMS
- The GMS will also monitor members and can use
this to drop them from a group - And it will report membership changes
5Group picture with GMS
GMS responds Group X created with you as the
only member
T to GMS What is current membership for group X?
p
P requests I wish to join or create group X.
q
GMS notices that q has failed (or q decides to
leave)
r
Q joins, now X p,q. Transfer new membership
view to members
s
GMS to T X p
r joins
t
u
GMS
6Group membership service
- Runs on some sensible place, like the server
hosting DNS - Takes as input
- Process join events
- Process leave events
- Apparent failures
- Output
- Membership views for group(s) to which those
processes belong - Seen by the protocol library that the group
members are using for communication support
7Issues?
- The GMS service itself needs to be fault-tolerant
- Otherwise our entire system could be crippled by
a single failure! - So well run two or three copies of it
- Hence Group Membership Service (GMS) must run
some form of protocol (GMP)
8Group picture with GMS
p
q
r
s
t
GMS
9Group picture with GMS
p
Lets start by focusing on how GMS tracks its own
membership. Since it cant just ask the GMS to
do this it needs to have a special protocol for
this purpose. But only the GMS runs this special
protocol, since other processes just rely on the
GMS to do this job
q
The GMS is a group too. Well build it first and
then will use it when building reliable multicast
protocols.
r
s
In fact it will end up using those reliable
multicast protocols to replicate membership
information for other groups that rely on it
t
GMS0
GMS1
GMS2
10Approach
- Lets assume that GMS has members p,q,r at time
t - Designate the oldest of these as the protocol
coordinator - To initiate a change in GMS membership,
coordinator will run the GMP - Others cant run the GMP they report events to
the coordinator - (Oldest is well-defined as a causal order based
on changing membership views)
11GMP example
p
q
r
- Example
- Initially, GMS consists of p,q,r
- Then q is believed to have crashed
12Failure detection may make mistakes
- Recall that failures are hard to distinguish from
network delay - We conservatively accept risk of mistake hope
that it is relatively accurate barring
partitioning - If p is running a protocol to exclude q because
q has failed, all processes that hear from p
will cut channels to q - Avoids messages from the dead
- q must rejoin (as a new process) to participate
in GMS again
13Basic GMP
- Someone reports that q has failed
- Leader (process p) runs a 2PC protocol
- Announces a proposed new GMS view
- Excludes q, or might add some members who are
joining, or could do both at once - Waits until a majority of members of current view
have voted ok - Then commits the change
14GMP example
- Proposes new view p,r -q
- Needs majority consent p itself, plus one more
(current view had 3 members) - Can add members at the same time
Proposed V1 p,r
Commit V1
p
q
r
OK
V0 p,q,r
V1 p,r
15Special concerns?
- What if someone doesnt respond?
- P can tolerate failures of a minority of members
of the current view - New first-round overlaps its commit
- Commit that q has left. Propose add s and drop
r - P must wait if it cant contact a majority
- Avoids risk of partitioning
16What if leader fails?
- Here we do a 3PC
- New leader identifies itself based on age ranking
in its membership view - i.e., oldest surviving process
- It runs an inquiry phase
- The adored leader has died. Did he say anything
to you before passing away? - Note that this causes participants to cut
connections to the adored previous leader - Then run normal 2PC but terminate any
interrupted view changes leader had initiated
17GMP example
- New leader first sends an inquiry
- Then proposes new view r,s -p
- Needs majority consent q itself, plus one more
(current view had 3 members) - Again, can add members at the same time
p
Proposed V1 q,r
Commit V1
Inquire -p
q
r
OK
OK nothing was pending
V0 p,q,r
V1 q,r
18Properties of GMP
- We end up with a single service shared by the
entire system - In fact every process can participate
- But more often we just designate a few processes
and they run the GMP - Typically the GMS runs the GMP and also uses
replicated data to track membership of other
groups - Using reliable, ordered multicast more later
19Use of GMS
- A process t, not in the GMS, wants to join group
Upson309_status - It sends a request to the GMS
- GMS updates the membership of group
Upson309_status to add t - Reports the new view to the current members of
the group, and to t - Begins to monitor ts health
20Processes t and u using a GMS
p
q
r
s
t
u
- The GMS contains p, q, r (and later, s)
- Processes t and u want to form some other group,
but use the GMS to manage membership on their
behalf
21Core GMS Protocol Properties
- C-GMS-1
- System membership takes the form of views
- Initial, predetermined system view
- Subsequent views contain addition or deletion of
processes - C-GMS-2
- Only processes that request to be added are added
- Only processes that are suspected of failure or
that request to leave are deleted - C-GMS-3
- A majority of processes in view i must agree in
the composition of view i1 - C-GMS-4
- There is a single sequence of views experienced
by all joined processes - A process receives a view when joined and
receives views until it leaves - C-GMS-5
- Assume process p expects process q of being
faulty and that the core GMS service is able to
report new views, then p and/or q will be dropped - C-GMS-6
- In a system with synchronized clocks and bounded
message latencies, any dropped process will know
within bounded time
22Robust Web Services Well build them with these
tools
Tools for solving practical replication and
availability problems well base them on ordered
multicast
Ordered multicast Well base it on
fault-tolerant multicast
Fault-tolerant multicast Well use membership
Tracking group membership Well base it on 2PC
and 3PC
2PC and 3PC Our first tools (lowest layer)
23JGroups
- Java toolkit for reliable group communication
- Join group
- Send to all or single group members
- Receive messages from group
- Channels as basic abstraction
- Similar to (BSD) sockets pull-based
- Building blocks for higher-level functionality
- E.g., PullPushAdapter
- Protocol stack
- Bidirectional list of protocol layers
- E.g., GMS as in Birman, 2005
- Used, e.g., for replication and load balancing in
a number of J2EE application servers
24JGroups Example
25Summary
- We moved one step towards practical replication
and availability tools - Dynamic Group Membership Service, GMS, for
tracking members - Join, leave, monitor operations
- Service provided by servers implementing core
Group Membership Protocol - Saw JGroups as an example of a system
implementing GMS - Still need a reliable multicast to have a full
group service... - Will revisit JGroups...