Membership - PowerPoint PPT Presentation

1 / 72

About This Presentation

Title:

Membership

Description:

Conclusion. 28. Motivation & Previous Work ... Conclusion. 38. Randomized Distributed Failure Detector Protocol ... Conclusion. We characterized Fault Detection ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 73

Provided by: thng

Category:

more less

Transcript and Presenter's Notes

Title: Membership

1
Membership
CS525 Presentation

Dongyun Jin
Thuy Nguyen

2
A Gossip-Style Failure Detection Service

R. v. Renesse
Y. Minsky
M. Hayden

3
Outline

Motivation
System Model
Mechanism of the Algorithm
Parameters to tune the Algorithm
Analysis
Multi-level Gossiping
Discussion
Conclusion

4
Motivation

Why do we need Failure Detection?
System Management
Replication
Load Balancing
When we detect a Failure in the system,
Not using it
Moving responsibility
Making another copy

5
Motivation
2
6
22
8
10
17
16
File reviews.doc
6
System Model

Fail-stop model (not Byzantine Fault)
Minimal assumptions about the network
Some message lost
Most messages are delivered within predetermined,
reasonable time.
Goal detecting failures at each host
Completeness
Accuracy
Speed

7
Characterization Approach

Strong/Weak COMPLETENESS
Failure of any member is detected by all/some
non-faulty members
STRONG ACCURACY
no mistake
Cannot achieve both
Guarantees completeness always and accuracy with
high probability.

8
Requirements

COMPLETENESS
SPEED
Every failure is detected by some non-faulty
member within T time units
ACCURACY
The probability that non-faulty member(not yet
detected as failed) is detected as faulty by any
other non-faulty member should be less than
Pmistake

9
Outline

Motivation
System Model
Mechanism of the Algorithm
Parameters to tune the Algorithm
Analysis
Multi-level Gossiping
Discussion
Conclusion

10
Mechanism of the Algorithm
1 10120 66
2 10103 62
3 10098 63
4 10111 65
2
1
Address
Heartbeat Counter
Time
4
Gossiping this list to others And when a node
receives it, merge this list with its list
3
11
Mechanism of the Algorithm
1 10118 64
2 10110 64
3 10090 58
4 10111 65
1 10120 66
2 10103 62
3 10098 63
4 10111 65
2
1
1 10120 70
2 10110 64
3 10098 70
4 10111 65
4
Gossiping this list to others And when a node
receives it, merge this list with its list
3
Current time 70 at node 2
12
Mechanism of the Algorithm

If the heartbeat has not increased for more than
Tfail seconds, the member is considered failed
And after Tcleanup seconds, it will delete the
member from the list

13
Mechanism of the Algorithm

The reason they dont delete a host right after
Tfail seconds?

1 10120 66
2 10110 64
3 10098 50
4 10111 65
1 10120 66
2 10110 64
4 10111 65
1 10120 66
2 10110 64
3 10098 75
4 10111 65
1 10120 66
2 10103 62
3 10098 55
4 10111 65
2
1
Current time 75 at node 2
4
3
14
Outline

Motivation
System Model
Mechanism of the Algorithm
Parameters to tune the Algorithm
Analysis
Multi-level Gossiping
Discussion
Conclusion

15
Parameters to tune the Algorithm

Given Pmistake,
Tfail
Tcleanup
Rate of gossiping, Tgossip
Choose Tfail so that erroneous failure detection
-gt Pmistake
Choose Tcleanup to be double of Tfail
Choose Rate of gossiping depending on network
bandwidth

16
Parameters to tune the Algorithm
tTcleanupt2Tfail
tTfail
t
failure
time
17
Analysis

Simplified Analysis
In a round, only one member can gossip to another
member with the success probability Parrival
By using dynamic method, we get the required
number of rounds to achieve a certain quality of
detection.
As round increases, Pmistake(r) decrease
Detection Time Multiply the round when
Pmistake(r) Pmistake , by Tgossip

18
Analysis

As members increases, the detection time
increase

19
Analysis

As requirement is loosened, the detection time
decrease.

20
Analysis

As failed members increases, the detection time
increase significantly

21
Analysis

The algorithm is resilient to message loss

22
Outline

Motivation
System Model
Mechanism of the Algorithm
Parameters to tune the Algorithm
Analysis
Multi-level Gossiping
Discussion
Conclusion

23
Multi-level Gossiping
In subnet i, with probability 1/ni
24
Discussion

We might be able to use other gossip method to
improve the performance
A hybrid of pull and push?
Sending only new contents?
It is consuming a lot of network resource.

25
Conclusion

A failure detection based on a gossip protocol.
Accurate with known probability
Resilient against message loss
Simple analysis of this algorithm
Multi-level Gossiping

26
On Scalable and Efficient Distributed Failure
Detectors

I. Gupta
T. D. Chandra
G. S. Goldszmidt

27
Outline

Motivation Previous Work
Problem Statement
Worst-case Network Load
Randomized Distributed Failure Detector Protocol
Analysis and Experimental Result
Discussion
Conclusion

28
Motivation Previous Work

Most distributed applications rely on failure
detector algorithms to avoid impossibility result
The Heartbeating algorithms are not as efficient
and scalable as claimed
Previous analysis model didnt consider
scalability.

29
Characterization Approach

Strong/Weak COMPLETENESS
Failure of any member is detected by all/some
non-faulty members
STRONG ACCURACY
no mistake
Cannot achieve both
Guarantees completeness always and accuracy with
high probability.

30
Requirements

COMPLETENESS
SPEED
Every failure is detected by some non-faulty
member within T time units
ACCURACY
The probability that non-faulty member(not yet
detected as failed) is detected as faulty by any
other non-faulty member should be less than PM(T)

31
System Model

A large group of n members
Each member knows each other
Crash (Non-Byzantine) Failure
Message loss rate pml
Member Failure Prebability Pf
qml (1 pml)
qf (1 pf)

32
Outline

Motivation Previous Work
Problem Statement
Worst-case Network Load
Randomized Distributed Failure Detector Protocol
Analysis and Experimental Result
Discussion
Conclusion

33
Worst-case Network Load
Definition The worst-case network load L of a
failure detector protocol is the maximum number
of messages transmitted by any run of the
protocol within any time interval of length T,
divided by T

SCALE
The worst-case network load L is close to the
optimal worst-case network load as possible
Equal expected load per member

34
Optimal Worst-case Network Load

Any distributed failure detector algorithm
imposes a minimal worst-case network load of
, for
Group of size n
Satisfying COMPLETENESS, SPEED, ACCURACY
Given values of T and PM(T)

35
Optimal Worst-case Network Load

A group member at a random point in time t is not
detected as failed yet and stays non-faulty until
at least time tT
It sends m messages for time T
If all messages are dropped, it need to be
detected as failed (SPEED)
Its probability should be less than PM(T)

36
Worst-case Network Load

Distributed HeartBeating
Gossip-style Failure Detection

-gt They are not scalable
37
Outline

Motivation Previous Work
Problem Statement
Worst-case Network Load
Randomized Distributed Failure Detector Protocol
Analysis and Experimental Result
Discussion
Conclusion

38
Randomized Distributed Failure Detector Protocol

Relax the SPEED condition to detect a failure
within an expected time bound of T time units
COMPLETENESS with probability 1
ACCURACY with probability of (1-PM(T))
The ratio of the worst-case network load to
optimal is independent of group size

39
Randomized Distributed Failure Detector Protocol
For every period,Select a random memberand send
a ping(Mi,Mj,pr)
reply a ack(Mi,Mj,pr)
time
Select k members randomlySend each of them a
ping-req(Mi,Mj,pr)And wait for ack(Mi,Mj,pr)
Send a ping(Mi,Mj,Mk,pr)
Mi
Mj
K3 other members
40
Randomized Distributed Failure Detector Protocol
time
T
Decalre Mj as failed
Mi
Mj
K3 other members
41
Outline

Motivation Previous Work
Problem Statement
Worst-case Network Load
Randomized Distributed Failure Detector Protocol
Analysis and Experimental Result
Discussion
Conclusion

42
Analysis and Parameters

We need to set the period T and k

43
Experimental Result
Independent from the number of member n
Resilient on number of failures and message loss
44
Discussion

Is the worst-case network load representing all
performance of the algorithm?
Small packet size imposes large overhead.
Aggregation of packets? Or another way?

45
Conclusion

We characterized Fault Detection Algorithms
The worst-case Network Load
And its optimal one
A randomized distributed failure detection
It is better in terms of the worst-case network
load

46
SWIM Scalable Weakly-consistent
Infection-style Process Group Memberhip
Protocol

Abhinandan Das
Indranil Gupta
Ashish Motivala

47
Questions

Why would we need another membership protocol?
How are process failures detected?
How are membership updates disseminated?
How can false failure detection frequency be
reduced?

48
Group Membership Service
Application queries, updates, etc.
Membership Protocol
Failure Detector
Joins, leaves, fails of members
Dissemination
Group Membership List
Unreliable Network
49
Large Scale Process Group requires Scalability
a process
1000s of processes
50
SWIM Actions
Mj fails
Step 1. Failure Detector detects failures at Mj
Mi
Step 2. Dissemination disseminates the failure
info
51
Scalable Failure Detectors

Strong Completeness
always guarantees
Speed of failure detection
Time to detect a failure
Accuracy
Rate of false failure detection
Network message load
in Bps generated by the protocol

52
Minimal Network Load L

To satisfy application-specified detection time
(T), and false detection rate (PM(T)), the
minimal total network load L is
L n.log(PM(T))/log(pml).T
n group size, pml indepent message loss
probability

53
Heartbeating

Centralized single failure point
Logical ring unpredictable under multiple
failures
All-to-all O(N/T) load per process
Problems
Try simultaneous detections at all processes
Does not separate failure detection and
dissemination
Solutions
Separate failure detection and dissemination
Not use heartbeat-based failure detection

54
SWIM Failure Detector
55
SWIM vs Heartbeating
Heartbeating
O(n)
First Detection Time
SWIM
Heartbeating
constant
Process Load
constant
O(n)

For fixed
False positive rate
Message loss rate

56
SWIM Failure Detector
Parameter SWIM
First Detection Time Expected e/(e-1) protocol periods Constant (independent of group size)
Process Load Constant per period lt 8L for 15 message loss
False Positive Rate Tunable
Completeness Deterministic time-bounded Within O(log(N)) periods w.h.p.
57
Dissemination
Mj fails
Step 1. Failure Detector detects failures at Mj
Mi
Step 2. Dissemination disseminates the failure
info
58
Dissemination Options

Multicast (hardware/IP)
costly (multiple simultaneous multicast)
unreliable
Point-to-point TCP/UDP
Expensive
Piggypacking updates on failure detectors
messages zero extra messages
Infection-style dissemination

59
Infection-style Dissemination
60
Infection-style Dissemination

Epidemic process
After ?log(n) protocol periods, n-n-(2?-2)
members heard about an update
Maintains a buffer of recently joined/evicted
processes
Piggypacked in pings, ping-reqs, and acks
Prefer recent updates
Buffer elements garbage collected after a while
After ?log(n) protocol periods this defines weak
consistency

61
Suspicion Mechanism

False detection due to
Perturbed processes
Packet losses
Indirect pinging may not solve the problem
e.g., correlated message losses near pinged host
Solution suspect a process before declaring it
as failed in the group

62
Suspicion Mechanism
State Machine of process Mjs view at process
Mi FD Failure Detector D Dissemination
D- (Suspect Mj)
Suspected
FD- Mi ping failed D- (Suspect Mj)
Time out FD- Mi confirms Mj failed
FD- Mi ping success D- (Alive Mj)
Alive
Failed
D- (Alive Mj)
D- (Confirm Mj failed)
63
Suspicion Mechanism

Timeout can be adjusted to trade-off false
detection rate with failure declaration time
Distinguish multiple suspicion of a process
Per-process incarnation number
Incarnation number incremented only by its
associated process
An alive message with higher incarnation number
overides alive/suspect message with lower
incarnation number and vice versa
Confirm messages overide alive and suspect
messages

64
Time-bounded Completeness

Round-robin pinging
each entry in Mis membership list is selected as
a ping target once during each traversal of the
list
After each traversal, membership list is randomly
reordered
Worst-case delay of successive selections of the
same target
2ni-1 protocol periods apart
Preserve average failure detection time of the
original FD

65
Experiments Set-Up

SWIM prototype
Win2000, Winsock 2
Uses only UDP messages
Experimental platform
Heterogeneous cluster of commodity PCs (gt32)
100Mbps Ethernet
Protocol setting
Number of process for ping-reqs(K)1, protocol
period2s, times to piggypacked and suspicion
timeout3ceillog(N1)
No perpeptual partial membership lists observed
in the experiments

66
Per-process message load is independent of group
size
67
Fig 3a. The avg detection time varies with group
size
68
Fig 3b. Median dissemination latency is
uncorrelated with group size
69
Fig 3c. Shows suspicion timeout used in Suspicion
Mechanism
70
Benefit of Suspicion Mechanism
71
Answers

Why would we need another membership protocol?
Heartbeat does not scale well
Need to separate detection and dissemination
How are process failures detected?
Randomized pings
How are membership updates disseminated?
Infection-style
How can false failure detection frequency be
reduced?
Suspicion mechanism

72
Critiques/Questions

In figure 3a, why the intervals between vertical
lines are different?
In figure 3b, why are the latencies high at 18
and 26?
Figure 3c does not have experimental information.
How is SWIM compared to other membership
protocol?
How does SWIM adapt to application requirement
and network dynamism? E.g. by changing k,
suspicion timeout