Title: Fault Tolerant Distributed Systems
1Fault Tolerant Distributed Systems
2The track
- Review the approaches to fault tolerance (coupled
with mutex algorithms in some cases, 1 - 3), Get
an idea of the terminology and field - 1 Revannaswamy, Bhatt - 97
- 2 Chang, Singhal, Liu - 90
- 3 Helary, Mostefaoui - 94
- 4 Stumm, Zhou - 90
- 5 Nett, Mock, Theisohn - 97
- 6 Ballarini, Bernardi, Donatelli - 02
- 7 Hardekopf, Kwiat, Upadhyaya - 01
- 8 Injong Rhee - 95
- 9 Goyer, Momtahan, Selic - 90
- 10 Frank Mueller - 01
- Summary
- Questions
31 Revannaswamy, Bhatt - 97
- O(log N), N Number of nodes in the spanning
tree - Communicate with neighbours only (Raymonds
algorithm) - Tolerant to single link / single node failures
- Eliminate the failed component and obtain
different tree structure (network is biconnected) - Mechanism to detect the failures is assumed to
exist - Only Branches are used for message exchanges
41 Contd - Reconfiguaration
- Repeat the steps below till all connected OR
insufficient chords
- At the end of reconfiguration, all
data-structures are reset and algorithm is
restarted, older token holder is pre-empted and
token is transferred to a newly elected leader
52 Chang, Singhal, Liu - 90
- O((2K) logN), Each node has K alternate paths
to token node - Works with DAG, no cycles should ever develop
- FIFO violation for greedy strategy
62 Contd. - Fault Tolerance
- Mechanism to detect the failures is assumed to
exist - Single link / single node failure and recovery
- Any path constructed by taking outgoing edges
always leads to the token holder - Studies all the states of the system at which the
failure may occur and gives solutions to each of
the states for tolerance - At recovery, the node asks its neighbours about
its state in order to reconstruct the data
structures - Fairness compromized at recovery - queue order is
arbitrary
73 Helary, Mostefaoui - 94
- Worst case O(lg n) for mutex, average O(lg n) for
fault-tolerance - Fairness compromized at recovery - queue order is
arbitrary - Best of both the worlds Naimi (dynamic),
Raymond(static) - Node topology preserves binomial tree (open cube)
structure - Assumes finite message delay for detection of
node failure
83 Contd. - Fault Tolerance
- Handles multiple and simultaneous node failures,
no link failures - Due to bounded message delays and estimated
critical section lengths, a node can suspect a
failure by timeouts. - Enquire appropriate nodes about their state and
deduce whether the suspicion is correct. - In case of failure, some data structures are
updated, tree is reconfigured
94 Stumm, Zhou
- Extension of coherent DSM algorithms for fault
tolerance - Tolerance to single host failures, no link
failures - Availability of broadcast (multicast)
communication is assumed - Communication cost assumed to be larger than
memory access costs - Critical state (DSM data, state info) replicated
to multiple hosts - Central server algo central server manages data
(request/response) - Backup the central server state, fault detection
by timeouts - Full replication algo all data on each host,
local reads, sequenced writes (by sequencers),
simple recovery by obtaining latest copy from
other hosts
104 Contd.
- Migration algo Data always migrated to the
accessing host, can be integrated with the VMM of
the host OS, maintain sequential consistency - Read replication algo Similar to migration but
no migration for read accesses - Comparisons
- Fault tolerant versions of central server and
full replication do not introdue a substantial
overhead - Central server is better for infrequent shared
data accesses - Full replication is better for sparse write
accesses
115 Nett, Mock, Theisohn
- Specification and realization of a generalized
distributed dependency management system for
fault tolerance - Key observation Management of dynamically
evolving dependencies is a key problem for fault
tolerance - Dependency graph nodes as sites, edges as
dependencies - Dependency set set of nodes reachable from start
nodes - Traversal of dependency set yields solution at
each node - The node is added to the dep. set ?
- Search continues ?
- Which dep. types to follow now ?
- In which direction ?
125 Contd - Example
- Stability completed actions have stable
dependencies - The proposed system supports only applications
that have stable dependency states - Does our application conform to this requirement
? - Next slide
135 Contd - Language
- Generate a state machine with this language and,
verify for stability
145 Contd - Example
- Send messages and generate global dependency
graph to look for inconsistencies
156 Ballarini, Bernardi, Donatelli
- Formal and theoretical discussion of
synchronization problem in distributed systems
167 Hardekopf, Kwiat, Upadhyaya
- Security and fault tolerance for aerospace
distributed communications by means of voting
mechanism - Basic voting protocol
- Initiator sends a request to one of the voters
- The voter multicasts a request to other voters
- Voters decide locally and send their vote to the
client - client waits for f1 alike votes where f is the
degree of fault tolerance - Problems not scalable, assumes exact voting
- Proposes a model of set of voters, interface
module and user where majority of voters are
assumed to be always correct !!
177 Contd - The algorithm
- Voters are authenticated when they commit
- O(1) messages with any number of voters
- Hashing and digital signatures for WAN
- Each voter will follow these steps
- If no other voter has committed an answer, commit
self-answer and skip - If some voter has committed and the result
conflict, then commit self-answer - After everyone had a chance, determine the
majority result - If majority is in conflict, then go to step 1.
- Interface module will receive first commit,
buffer it and start the timer. - If new commit is received old is over-written and
timer is restarted. - Otherwise the buffered result is output to the
client
188 Injong Rhee
- Fault tolerance in distributed resource
allocation by locality - Failure locality max. number of processes dead
due to failure - Depends on degree of distributedness of algorithm
- Tight lower bound on failure locality is
where, is the maximum number of conflicting
processes - Dining philosopher problemhas the optimal failure
locality
- Proposed algorithm breaks the chain by not
allowing a process to wait for a process which is
also blocked, achieves
198 Contd - Model
- Each process is modeled in steps send, receive,
local, fail and local state - Guarded command set receive -gt Action,...
- Configuration C is a set of current local
- Network is a process too
- Enabling and execution of guarded sommand set
- System is initial configuration C0 and execution
of guarded command set of all the processes - No constraints on relative timing of execution of
process steps - Priorities by logical clocks (with waiting
factor) - Centralized resource managers manage exclusion by
responding to the requests for resource (with
preemption)
209 Goyer, Momtahan, Selic
- Hierarchical control strategy for fault tolerance
(centralized app) - Simplicity, efficiency, predictability
- Transient and hard failures - recovers or doesnt
recover ? - Soft QoS guarantees in terms of response times
2110 Frank Mueller
- Extension of Naimis protocol - single node
failures - Backup communication in form of ring structure
- Fault handler node Find out the faulty node by
timeout, collect fault information (from 2 nodes)
and take appropriate action
22Some ideas - for our protocol
- The first node experiencing timeout may send a
message to every other node asking for their
states - The node experiencing timeout must be waiting
for a grant/token to arrive OR waiting for a
release to arrive - In case of grant/token, propagate the message to
the parent chain to detect the pooint of failure - In case of release, send the message down the
chain of children to see who is supposed to send
a release - Once detected the point, recalculate the state of
the affected nodes - the ones falling on the
chain - Assuming no link failures for now, multiple
failures can be detected
23Summary
- Approaches fall into following categories
- Embedded fault tolerance into the base algorithms
- inspiring, concrete, but hard to import - Generalized approaches - demonstrate strength of
applicability but no concrete algorithmic
description - Voting approaches - can be used to reach
consensus, useful in case of multiple failures - Centralized approach - simple, predictable but
may be not desirable - Replication based approaches - simple, incur
overhead, sometimes the only solution (esp for
complex algorithms) - No prominent solution for detecting failure -
even for very specific problems/algorithms
24Questions