Title: Automatic Synthesis of FaultTolerance
1Automatic Synthesis of Fault-Tolerance
- Ali Ebnenasir
- Software Engineering and Network Systems
Laboratory - Computer Science and Engineering Department
- Michigan State University
- Advisor Dr. Sandeep Kulkarni
2Problem
- Given a program p and a class of faults f,
- Question How do we add desired fault-tolerance
properties to p in order to create a new program
p such that - Requirements
- In the absence of f, the resulting fault-tolerant
program p behaves similar to p - In the presence of f, the resulting
fault-tolerant program p satisfies the desired
fault-tolerance property.
3Solution Strategies
- Two possible approaches
- Redesign p and verify its correctness w.r.t
problem requirements - Expensive approach
- Automatically synthesize p from p
- Correct by construction
4- Previous Work on Automated Synthesis
5Synthesis Specification-Based
Specification of p (Temporal Logic
Expressions/ Automata)
Synthesis Algorithm (prove the satisfiability
of the specification)
Fault-tolerance requirements (Temporal Logic
Expressions)
Faults
Fault-tolerant program p
Program synthesis Fault-Tolerance
synthesis EmersonClarke 1982 AroraAttieEmer
son 1998 AttieEmerson 2001 KupfermannVardi
2001
6Synthesis Calculational
Fault-intolerant program p (Transitions)
Synthesis Algorithm (Calculate the set of
transitions)
Fault-tolerance requirements
Faults (Transitions)
Fault-tolerant program p (Transitions)
KulkarniArora 2000 KulkarniAroraChippada
2001
7The Complexity of Calculational Synthesis
- High atomicity model processes can atomically
read/write all program variables - Polynomial in the state space of the
fault-intolerant program p KA00 - Low atomicity model (distributed programs)
processes have read/write restrictions with
respect to program variables - Exponential in the state space of the
fault-intolerant program p for synthesizing
masking fault-tolerance KA00
Propose techniques for the synthesis of
fault-tolerant distributed programs
KA00 S.S. Kulkarni and A. Arora, Automating the
addition of fault-tolerance, FTRTFT 2000.
8Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- Step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
9Preliminary ConceptsPrograms and Faults
- Program
- Finite number of variables with finite domains
- Finite number of processes
- State a valuation of program variables
- Finite state space Sp
- State predicate X X ? Sp
- Program p, Fault f ?
(s0, s1) (s0, s1) ? Sp ? Sp - Closure X is closed in p
Sp
10Preliminary ConceptsSpecifications and
Fault-Tolerance
- Safety specification something bad never
happens - Representation ? (s0,
s1) (s0, s1) ? Sp ? Sp - E.g., transitions that change the value of a
counter from non-zero - values to zero
- Liveness specification something good will
eventually happen - In the absence of faults, fault-tolerant program
p satisfies the liveness specification of the
fault-intolerant program p - Invariant S, fault-span T ? Sp
- Fault-tolerance Failsafe, Nonmasking, Masking
Sp
Program
Fault
11Preliminary ConceptsDistribution Model
- Read/Write restrictions (low atomicity model)
- Assumption a process cannot write a variable
that it cannot read. - Example program p
- Two processes j, k
- Two Boolean variables a and b
- Process j cannot read b, but can read and write a
- Write restrictions
- Can we include the following transition in the
set of transitions of process j?
a
b
j
k
Write restrictions identify the set of
transitions of each process.
12Preliminary ConceptsDistribution Model
Continued
- Read restrictions
- Can we include the following transition in the
set of transitions of process j?
Groups of transitions (instead of individual
transitions) must be chosen.
13Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- Step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
14Synthesis Problem
Distribution restrictions
Fault-intolerant program p
(Masking/Nonmasking/Failsafe) Fault-tolerant
program p'
Synthesis Algorithm
Specification Spec
Invariant S
Invariant S'
Faults f
Desired level of Fault-intolerance
(Masking/Nonmasking/Failsafe)
- Requirements
- No new behaviors are added in the absence of
faults. - In the presence of faults, p provides desired
level of fault-tolerance.
15Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
16Theoretical IssuesStep-Wise Automation
Masking fault-tolerant
KA00
Failsafe
Intolerant Program
17Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
18Theoretical Issues Polynomial-Time Boundary
- Complexity reduction from 3-SAT to the problem
of synthesizing failsafe fault-tolerant
distributed programs - In general, the problem of synthesizing failsafe
fault-tolerant distributed programs from their
fault-intolerant version is - NP-complete.
- Intuitively, the exponential complexity is due to
the inability of a process to safely estimate
unreadable variables even in the absence of
faults (grouping issue). - What are the necessary and sufficient conditions
for polynomial synthesis of failsafe
fault-tolerant distributed programs? - Restrictions on
- The transitions of the fault-intolerant programs
- Specifications
19Theoretical Issues Monotonicity of
Specifications
- Definition A specification spec is positive
monotonic with respect to a Boolean variable x
iff - For every (s0, s1) and (s0, s1) grouped
together due to inability of reading x
20Theoretical Issues Monotonicity of Programs
- Definition Program p with invariant S is
positive monotonic with respect to a Boolean
variable x iff - For every (s0, s1) and (s0, s1) grouped
together due to inability of reading x
Monotonicity requirements capture the notion that
safe assumptions can be made about variables
that cannot be read
21Theoretical Issues Monotonicity Theorem
- Sufficiency
- if
- Program is negative monotonic, and
- Spec is positive monotonic
- Or
- Program is positive monotonic, and
- Spec is negative monotonic
- Then
- Synthesis of failsafe fault-tolerance can be done
in polynomial time - Necessity If only one of these conditions is
satisfied then synthesizing failsafe
fault-tolerance remains NP-complete. - For many problems, these requirements are easily
met (e.g., Byzantine agreement,
consensus, atomic commit)
22Theoretical Issues An Example for Monotonicity
Theorem
- Dijkstras guarded commands (actions)
- Guard ? Statement
- (s0, s1) Guard holds at s0 and atomic
execution of Statement yields s1 - Example Byzantine agreement
- Safety Specification of Byzantine agreement
- Agreement No two non-Byzantine non-generals can
finalize with different decisions - Validity If g is not Byzantine then no
non-Byzantine process can finalize with a
different decision with respect to g - Processes General, g, and three non-generals j,
k, and l - d.g 0, 1
- d.j, d.k, d.l 0, 1, -
- b.g, b.j, b.k, b.l 0, 1
- f.j, f.k, f.l 0, 1
g
l
k
j
23Theoretical Issues An Example for Monotonicity
Theorem
- Program actions for process j
- d.j - ? f.j 0 ? d.j d.g
- d.j ? - ? f.j 0 ? f.j 1
- Fault transitions for process j
- b.g ? b.j ? b.k ? b.l ? b.j 1
- b.j ? d.j 01
- Read/Write restrictions
- Readable variables for process j
- b.j, d.j, f.j, d.g, d.k, d.l
- Process j can write d.j, f.j
24Theoretical Issues An Example for Monotonicity
Theorem Continued
- Observation 1 Negative monotonicity of
specification with respect to f.j - Observation 2 Positive monotonicity of program,
consisting of the transitions of j, with
respect to f.k - Observation 3 Positive monotonicity of
specification with respect to b.j - The specification does not stipulate anything
about the Byzantine processes - Observation 4 Negative monotonicity of program,
consisting of the transitions of j, with
respect to b.k
Synthesis of agreement program that is failsafe
to Byzantine faults can be done in polynomial
time.
25Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
26Theoretical Issues Heuristics
- Heuristic A strategy for making deterministic
decisions to reduce the complexity of synthesis - Example Reuse the structure of nonmasking
programs in the synthesis of their masking
versions
Masking fault-tolerant
Fault-Tolerance Enhancement
Intolerant Program
27Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
28Theoretical Issues Pre-Synthesized
Fault-Tolerance Components
- What if existing heuristics fail?
- How can we reuse the techniques used in the
synthesis of a program, in the synthesis of
another program? - Can we encapsulate commonly encountered synthesis
patterns in terms of pre-synthesized
fault-tolerance components? - Detectors and correctors are necessary and
sufficient in the design of fault-tolerance
AK98 - Detectors and correctors have the potential to
provide a rich library of pre-synthesized
fault-tolerance components
AK98 A. Arora and S.S. Kulkarni, Detectors and
Correctors A Theory of Fault-Tolerance , IEEE
ICDCS 1998.
29Theoretical IssuesUsing Pre-Synthesized
Components
- If available heuristics fail to add recovery from
a deadlock state sd
Automatically specify the required component
Extract the component from the component library
Verify the interference-freedom of the composition
Add extracted component to the fault-intolerant
program
30Theoretical IssuesPre-Synthesized Components -
Achievements
- Reducing the chance of failure in the synthesis
- Providing a mechanism for the reuse of synthesis
techniques - Extending the scope of synthesis problem where
the state space is expanded during the synthesis - Controlling the way new variables are introduced
31Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
32Practical IssuesFramework Goals
- Goals of the framework design
- Ability to synthesize fault-tolerant programs
from their fault-intolerant versions - Ability to integrate new heuristics into the
framework - Ability to change implementation
33Practical IssuesSynthesis Framework
Library of pre-synthesized fault-tolerance
components
Component specification
Pre-synthesized detectors/correctors
Synthesis algorithm
Query
Results
p, S, f, spec
p, S
Interactive user interface
Results
p, S, f, spec
p, S
Query
The user (Fault-tolerance developer)
Guarded commands, State predicates
Guarded commands/ Promela, State predicates
34Practical IssuesFramework Internals Synthesis
Algorithm
Fault-intolerant program
p, S, f, spec
Interaction points
Initialization
Preserve Invariant
Modify Invariant
Expand the reachability graph
Calculate a valid fault-span
Calculate a valid invariant
Remove bad transitions
Ensure safety
Ensure deadlock freedom
Remove bad states
Ensure deadlock freedom
Resolve non-progress cycles
Fault-tolerant program
Reachability graph of the fault-tolerant program
p, S
35Practical IssuesCurrent Status of the Framework
- Example synthesized programs
- Token ring with 7 processes
- Byzantine agreement with 4 non-general processes
and one general process - An agreement program that is subject to both
Byzantine and fail-stop faults (1.3 million
states) - Currently, the framework can
- handle different types of faults (e.g., process
restart, Byzantine, fail-stop) - synthesize programs that are simultaneously
subject to multiple types of faults
36Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
37Contributions
- Showing the NP-completeness of synthesizing
failsafe fault-tolerance - Identifying the necessary and sufficient
conditions for polynomial-time - synthesis of failsafe fault-tolerance
- Reusing the computational structure of
fault-intolerant programs to reduce the
complexity of synthesis (enhancement) - Identifying synthesis patterns as pre-synthesized
fault-tolerance components - Developing a framework for the synthesis of
fault-tolerant programs
38Outline
- Preliminary concepts
- Synthesis problem
- Current results
- Theoretical issues
- step-wise automation
- Polynomial-time boundary
- Heuristics
- Pre-synthesized fault-tolerance components
- Practical issues
- A framework for the synthesis of fault-tolerant
programs - Contributions
- Open problems
39Open Problems
- Theoretical issues
- Non-monotonic programs/specifications to
monotonic ones - Extending the scope of the programs that can reap
the benefit of efficient automation - Necessary and sufficient conditions for
simultaneous addition of multiple pre-synthesized
fault-tolerance components - Necessary and sufficient conditions for
polynomial-time synthesis of nonmasking
fault-tolerant programs - Automated synthesis of multitolerance
40Open Problems - Continued
- Practical issues
- Distributed synthesis algorithm
- Symbolic synthesis of fault-tolerance
Distributed Synthesis Algorithm
Verify safety
Y/N
Closure
Y/N
Cycle detection
Y/N
. . .
SAT solver
SAT solver
SAT solver
41Open Problems - Continued
- Using model checkers for acquiring behavioral
information during synthesis
Distributed Synthesis Algorithm
. . .
SPIN
SPIN
SPIN
42Publications
- Published papers
- Sandeep S. Kulkarni and Ali Ebnenasir. "Enhancing
The Fault- Tolerance of Nonmasking Programs".
IEEE ICDCS 2003. - Ali Ebnenasir. "Algorithmic Synthesis of
Fault-Tolerant Distributed Programs". Doctoral
Symposium of ICDCS 2003. - Sandeep S. Kulkarni and Ali Ebnenasir. "The
Complexity of Adding Failsafe Fault-Tolerance".
IEEE ICDCS 2002. - Submitted papers
- Sandeep S. Kulkarni and Ali Ebnenasir. "Adding
Fault-Tolerance Using Pre-Synthesized
Components". Submitted to CBSE7, ICSE 2004. - Ali Ebnenasir and Sandeep S. Kulkarni . "A
Framework for Automatic Synthesis of
Fault-Tolerance". Submitted to DSN 2004. - Sandeep S. Kulkarni and Ali Ebnenasir. "Automated
Synthesis of Multitolerance". Submitted to DSN
2004.
43Thank you!