Title: The Complexity of Adding Failsafe Fault-tolerance
1The Complexity of Adding Failsafe Fault-tolerance
- Sandeep S. Kulkarni
- Ali Ebnenasir
2Motivations
- Why automatic addition of fault-tolerance?
- Why begin with a fault-intolerant program?
- Reuse of the fault-intolerant program
- Separation of concerns (functionality vs.
fault-tolerance) - Potential to preserve properties such as
efficiency - One obstacle
- Adding masking fault-tolerance to distributed
programs is NP-hard FTRTFT, 2000
3Motivation (Continued)
- Approach for dealing with complexity
- Heuristics SRDS 2001
- Weaker form of tolerance
- Failsafe
- Safety only in the presence of faults
- Nonmasking
- Safety may be temporarily violated
- Restricting input
- Programs
- Specifications
4Motivation (Continued)
- Why failSafe Fault-Tolerance?
- Simplify the design of masking
- Partial automation of masking fault-tolerance
(using TSE98)
Automate
Automate
Intolerant Program
5Outline of the Talk
- Problem of adding fault-tolerance
- Difficulties caused by distribution
- Complexity of failsafe fault-tolerance
- Class of programs and specifications for which
polynomial synthesis is possible
6Basic ConceptsPrograms and Faults
- State space Sp
- Program transitions deltap, faults deltaf
- Invariant S, fault-span T
- Specification spec Safety is specified by
transitions, (sj, sk) that should not be executed
T
S
7Problem Statement
- Inputs program p, Invariant S, Faults f,
Specification spec - Outputs program p, Invariant S
- Requirements Only fault-tolerance is added no
new functional behavior is added
8Difficulties with Distribution
- Read/Write restrictions
- Two Boolean variables a and b
- Process cannot read b
- Can we include the following transition?
Groups of transitions (instead of individual
transitions) must be chosen.
9Reduction from 3-SAT
10Dealing with the Complexity of Adding Failsafe
Fault-tolerance
- For what class of problems, failsafe
fault-tolerance can be added in polynomial time - Restrictions on
- Fault-tolerant programs
- Specifications
- Faults
- Our approach for restrictions
- In the absence of faults, preserve all
computations of the fault-intolerant program
11Restrictions on Programs and Specifications
- Monotonicity requirements
- Capture the notion that safe assumptions can be
made about variables that cannot be read - Focus on specifications and transitions of
fault-intolerant programs
12Monotonicity of Specifications
- Definition A specification spec is positive
monotonic with respect to variable x iff - For every s0, s1, s0, s1
- The value of all other variables in s0 and s0
are the same - The value of all other variables in s1 and s1
are the same
13Monotonicity of Programs
- Definition Program p with invariant S is
negative monotonic with respect to variable x
iff - For every s0, s1, s0, s1
- The value of all other variables in s0 and s0
are the same - The value of all other variables in s1 and s1
are the same
14Theorem
- Adding failsafe fault-tolerance can be done in
polynomial time if either - Program is negative monotonic, and
- Spec is positive monotonic
- Or
- Program is positive monotonic, and
- Spec is negative monotonic
- If only one of these conditions is satisfied then
adding failsafe fault-tolerance is still NP-hard - For many problems, these requirements are easily
met
15Example Byzantine Agreement
- Processes General, g, and three non-generals j,
k, and l - Variables
- d.g 0, 1
- d.j, d.k, d.l 0, 1, -
- b.g, b.j, b.k, b.l true, false
- f.g, f.j, f.k, f.l 0, 1
- Fault-intolerant program transitions
- d.j - /\ f.j 0 d.j
d.g - d.j ? - /\ f.j 0
f.j 1 - Fault transitions
- b.g /\ b.j /\ b.k /\ b.l
b.j true - b.j
d.j,f.j 01,01
16Example Byzantine Agreement (Continued)
- Safety Specification
- Agreement No two non-Byzantine non-generals can
finalize with different decisions - Validity If g is not Byzantine, no process can
finalize with different decision with respect to
g - Read/Write restrictions
- Readable variables for process j
- b.j, d.j, f.j
- d.g, d.k, d.l
- Process j can write
- d.j, f.j
17Example Byzantine Agreement (Continued)
- Observation 1
- Positive monotonicity of specification with
respect to b.j - Observation 2
- Negative monotonicity of program, consisting of
the transitions of j, with respect to b.k - Observation 3
- Negative monotonicity of specification with
respect to f.j - Observation 4
- Positive monotonicity of program, consisting of
the transitions of j, with respect to f.k
18Summary
- Complexity analysis for failsafe fault-tolerance
- Reduction from 3-SAT
- Restrictions on specifications and programs for
which polynomial synthesis is possible - Several problems fall in this category
- Byzantine agreement, consensus, commit,
- Necessity of these restrictions
19Future Work
- Simplifying the design of masking fault-tolerance
using the two-step approach - Refining boundary between classes for which
polynomial synthesis is possible and for which
exponential complexity is inevitable - Using monotonicity requirements for simplifying
masking fault-tolerance
20Thank You
21Future Work
- Conclusion
- Specifying the boundary
- Fault-tolerance addition can be done in
polynomial time - Exponential complexity is inevitable
- Goal what problems can benefit from automation?
- Necessity and sufficiency of monotonicity
requirements - Future Work
- How can we Change a non-monotonic program to a
monotonic one by modifying its invariant? - How can we Strengthen a non-monotonic
specification to a monotonic one? - How a nonmasking program can be designed manually
to satisfy monotonicity requirements?
22Basic Concepts Fault-tolerant Program
- Fault-tolerance in the presence of faults
- Failsafe Satisfies its safety specification
- Nonmasking Satisfies its liveness specification
- (safety may be violated temporarily)
- Masking Satisfies safety and liveness
specification -
23The complexity of Adding Failsafe
fault-tolerance
- Adding (failsafe/nonmasking/masking)
fault-tolerance in high atomicity model is in P - Adding masking fault-tolerance to distributed
programs is in NP - How about failsafe?
- Adding Failsafe to distributed programs
- is NP-hard!! (proof in the paper)
- Reduction of 3-SAT to the problem of failsafe
fault-tolerance addition -
24Our Approach
- Stepwise towards masking fault-tolerance
- Automating the addition of failsafe
- fault-tolerance
- How hard is adding failsafe fault-tolerance?
- Polynomial time boundaries for failsafe tolerance
addition?
25