Seminarie Informatica

About This Presentation

Title:

Seminarie Informatica

Description:

The lighter the color, the more complex. the problem of. expressing fault tolerance ... e.g., to delay a request until it is acceptable, i.e., until a guard is met ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 60

Provided by: vincenzo7

Category:

more less

Transcript and Presenter's Notes

Title: Seminarie Informatica

1
Seminarie Informatica

Fault-tolerant Systems The Software Viewpoint

A series of seminars coordinated byVincenzo De
Florio http//www.pats.ua.ac.be
2
The matter

The exam
The topics
This lecture
Application-level fault tolerance provisions

3
Introduction to the exam

Seminarie informatica
10 seminars on hot topics of computer science
Topic of this cycle software fault-tolerant
systems
Next 3 seminars 15, 22 November 6 December
Next year seminars to be announced on
http//www.win.ua.ac.be/vincenz/si/0607.html

4
Introduction to the exam

Oral discussion of 2 papers
A 56 page paper based on one or more of the
topics of the seminars
A paper with the analysis of a case study
See later for examples
Evaluation criteria
Do the papers contain original ideas? Do they
follow too strictly the seminar?
Does the author understand the subject? Is (s)he
able to reason independently about the subject?
Papers must be submitted by May 15, 2007
E-mail to vincenzo.deflorio_at_ua.ac.be

5
The Topics
Dependability the property of a system such
that reliance can justifiably be placed on the
service it delivers
Fault tolerance one of the means of
dependability
6
The Dependability Tree
7
Fault tolerance (FT)
Fault-tolerant system is system that continues
to function in spite of faults
defect IC
bug in program
operation fault
sensor drift
8
Attributes of dependability

Availability
Readiness for usage
A(t) probability that system is conform to
specification at time t
Reliability
Continuity of service
R(t) probability that system is conform to
specifications during t0,t, provided that so it
is at t0

9
Attributes of dependability (2)

Safety
Non-occurrence of catastrophic consequences on
environment
S(t) probability that a system is either
conform to specification, or reaches a safe halt,
at time t
Fail-safe systems

10
Attributes of dependability (3)

Maintainability
Aptitude to undergo repairs and evolution
M(t) probability that system is back to
specifications at t if failed at t0

11
Attributes of dependability (4)

Confidentiality
Non-occurrence of unauthorised disclosure of
information
Integrity
Non-occurrence of improper alterations of
information

12
Related attributes

Testability
Ability to test features of a system
Related to maintainability

13
Related attributes

Security
Integrity availability confidentiality

14
References

Jean-Claude Laprie, Dependable Computing and
Fault Tolerance Concepts and Terminology, in
Proc. of the 15th Int. Symposium on
Fault-Tolerant Computing (FTCS-15), Ann Arbor,
Mich., June 1985, pp.2-11
Jean-Claude Laprie, Dependability---Its
Attributes, Impairments and Means, in
Predictably Dependable Computing Systems, ESPRIT
Basic Research Series, B. Randell and J.-C.
Laprie and H. Kopetz and B. Littlewood (eds.),
Springer Verlag, 1995, pp. 3-18.

15
The lecture

We now focus on application-level fault tolerance
Why do we need ALFT? Why do we need software FT
in the first place?
We explain why
We survey the existing methods and assess their
pros and cons against a set of properties
Surprising conclusion still an open problem

16
Structure

Introduction
Identification of main problems to tackle / key
properties to achieve
Qualitative survey
Sketch of a possible ideal solution
Conclusions

17
Software Fault Tolerance

Human society more and more expects
and relies on good quality of complex
services supplied by computers

18
Software Fault Tolerance

Consequences of a failure in the 40s(Computers
as fast solvers of numerical problems)
Errors in computations, long downtimes

Consequences of a failures nowadays(Computers
controlling nuclear plants, airborne equipment,
healthcare)
Incalculable penalty (catastrophes)

19
Software Fault Tolerance

Traditional answer Hardware Fault
Tolerance
This is an important ingredient, but not the
only one needed today!

Complexity is also in the SW layers
Hierarchies of complex abstract machines

20
Software Fault Tolerance

Complexity is also in SW layers (cont.ed)
Software is often networked and distributed
Relationships among software components are often
complex
Object model Þ Easier SW reuse Þ Hidden
explicit Complexity

21
Software Fault Tolerance

In conclusion No amount of verification,
validation and testing can eliminate all faults
in an application and give complete confidence in
the availability and data consistency of
applications

Fault tolerance in SW is key

SW failures can have the same extent in
consequences of failures in HW

Ariane 5 !
22
Problems of SW FT
The lighter the color, the more general purpose
the (virtual) machine
The lighter the color, the more complexthe
problem ofexpressing fault tolerance
23
Problems of Application-levelFault Tolerance

The only alternative and effective means for
increasing software reliability is that of
incorporating in the application software
provisions for SFT
The Application software has to manage
Functional aspects
Fault tolerance (FT) aspects
at the same time / in the same space

24
Problems and properties of Application-level
Fault Tolerance

Hazard code intrusion
FT provisions are specified side by side with the
service
Conflicting design concerns
Overall design complexity gets increased
Larger development and maintenance costs times
Larger probability of introducing software bugs

25
Problems and properties of Application-level
Fault Tolerance

Separation of design concerns ( SDC )
In what follows we call an ALFT a means to
express fault tolerance in the application
software
A criterion to compare ALFTs is by their degree
of SDC

26
Problems and properties of Application-level
Fault Tolerance

Hazard porting code ¹
porting service
FT code assumes fault model f(e)
If e changes, or
If the code is moved to another environment e
the QoS may degrade

27
Problems and properties of Application-level
Fault Tolerance

Hazard porting code ¹
porting service

370 Million Euros in the sink

An interesting case Ariane 5 501
Ariane 4 missions software re-used inAriane 5
The early part of the trajectory of Ariane 5
differed from that of Ariane 4 and resulted in
quite higher horizontal velocity values

This could be a case study for the exam
28
Problems and properties of Application-level
Fault Tolerance

Problem service portability
Porting FT comes not for free
Hardwired fault model static environment
More difficult to adapt / test / maintain
More prone to Ariane 5 - effects

What is the most often overlooked risk in sw
engineering? That the environment will do
something the designer never anticipated

J. Horning
29
Problems and properties of Application-level
Fault Tolerance

Adaptability ( AD )
Does the ALFT provide means to adapt,
dynamically, to new environmental conditions?
A criterion to compare 2 ALFTs is by their
degree of AD

30
Problems and properties of Application-level
Fault Tolerance

Problem adding complexity can decrease the
dependability
The ALFT (the means to express FT) must be based
on a simple strategy
It must be syntactically adequate to host several
mechanisms

31
Problems and properties of Application-level
Fault Tolerance

Hazard
Languages shape the way we think Warf

If all you have is a hammer, everything looks
like a nail
/usr/share/fortune

but is it really a nail?

Syntactical Adequacy ( SA )
Does the ALFT provide simple means to host many
FT solutions?
A criterion to compare 2 ALFTs is by their
degree of SA

32
Summary

Separation of design concerns ( SDC )
Adaptability ( AD )
Syntactical Adequacy ( SA )

A base of attributes we can use to compare
ALFTs with one another

33
System structures for SFT

Single-version FT
Multiple-version FT
Object model
Linda Model
FT Languages
Recovery metaprogram

Each of these could be a case study for the exam
34
Single-version Fault Tolerance

Single-version SFT embedding in the user
application of a simplex system a set of error
detection / recovery features
Explicit code intrusion (bad SDC )
Increases size and complexity (bad SA )
Bad for transparency, maintainability,
portability
Increases development times and costs
No support for dynamic adaptability (bad AD )
Libraries
SwIFT, HATS, EFTOS

35
Multiple-version Fault Tolerance

Multiple-version SFT NVP and RB
Idea redundancy of software independently
designed versions of software
Randell (1975) All fault tolerance must be
based on the provision of useful redundancy, both
for error detection and error recovery. In
software the redundancy required is not simple
replication of programs but redundancy of design
Assumption random component failures. Correlated
failures Þ sudden exhaustion of available
redundancy
Again, Ariane 5 flight 501 two crucial
components were operating in parallel with
identical hardware and software

36
Multiple-version Fault Tolerance
include ltftmacros.hgt ...
ENSURE(acceptance-test) Alternate
1 ELSEBY Alternate 2
... ENSURE
37
Multiple-version Fault Tolerance
include ltftmacros.hgt ... NVP
VERSION block 1 SENDVOTE(v-pointer, v-size)
VERSION block 2
SENDVOTE(v-pointer, v-size)
ENDVERSION(timeout, v-size) if
(!agreeon(v-pointer)) error_handler()
ENDNVP
38
Multiple-version Fault Tolerance

Multiple-version SFT
Implies N-fold design costs, N-fold maintenance
costs
The risk of correlated failures is not negligible
Code intrusion is limited (Acceptable SDC )
System structure is fixed (Bad SA )
No support for dynamic adaptability (bad AD )
Can be combined with other means

39
Object-centred Strategies

Strategies based on the object model
Metaobject protocols and reflection
Open implementation of the run-time executive of
an OO-language
Reflection, reification
Composition filters
Each object has a set of filters. Messages sent
to any object are trapped by its filters. These
filters possibly manipulate the message before
passing it to the object.

40
Object-centred Strategies

Active objects
Objects that have control over the
synchronisation of incoming requests from other
objects. Objects can autonomously decide, e.g.,
to delay a request until it is acceptable, i.e.,
until a guard is met
FRIENDS, SINA, Correlate
Full separation of design concerns (Good SDC )
No code intrusion
Syntactically adequate - at least for a subset of
FT strategies (Acceptable SA )

41
Object-centred Strategies

Assumption application written in extended
OO-language
Adaptability? (Questionable AD )

42
FT Linda Systems

Generative communication - messages are not
sent, they are stored in a public, distributed
shared memory
A shared relational database for storing and
withdrawing tuples
Tuples lists of objects identified by their
contents, cardinality and type
A Linda process inserts, reads, and withdraws
tuples via blocking or non-blocking primitives
Synchronisation presence / absence of a matching
tuple

43
Linda

In master-worker applications
Dynamic load balancing, also in heterogeneous
clusters
Inherently tolerates crash failures of workers
Single-op atomicity
Solutions
Atomic transactions with multiple TS ops
Stable tuple space
Tuple space checkpointing, etc.

Possible case study for the exam
44
Linda

FT-Linda, Persistent Linda...
Full separation of design concerns (Good SDC )
No code intrusion
Syntactically adequate - at least for a subset of
FT strategies (Acceptable SA )
Assumption application written in Linda
Adaptability? (Questionable AD )

45
FT Languages

FT Languages
Enhanced, pre-existing
Examples
FT-SR
Fail-stop modules - abstract unit of
encapsulation
Atomic execution
Composability
x-Linda (x C, Fortran, C, )

46
FT Languages

FT Languages
Novel languages
Examples
Argus distributed OO programming language and
operating system
Guardians objects performing user-definable
actions in response to remote requests
Atomic transactions
FTAG functional language based on attribute
grammars

47
FT Languages

FTAG
Computation collection of pure mathematical
functions, the modules.
Each module has a set of input values, called
inherited attributes, and of output variables,
called synthesized attributes.

48
FTAG (cont.d)

Primitive modules can be executed
Non-primitive modules require other modules to be
performed first
FTAG program decomposing a root module into
its basic sub-modules and then applying
recursively this decomposition process to each of
the sub-modules (computation tree)

49
FTAG (cont.d)

Natural support for redoing (replacing a portion
of the computation tree with a new computation)
Natural support for replication (replicated
decomposition a module is decomposed into N
identical sub-modules implementing the function
to replicate)

50
FT Languages

Conclusions for FT languages
adequate separation of design concerns,
transparency (good SDC )
special purpose syntax (potentially good SA )
application must be written with non standard
language
bad portability
Adaptability ( AD ) unknown

51
RMP

Recovery Metaprogram
Two cooperating processing contexts
User-placed breakpoints in the user context bring
to the execution of a meta-program
When the meta-program ends, control is returned
to the user program
Meta-program is to be written in CSP

52
RMP

Adequate, e.g., for recovery blocks
Breakpoint can trigger the execution of
CHECKPOINT
ALTERNATES
ACCEPTANCE TESTS...

53
RMP

RMP summary
Full separation of design concerns
No code intrusion (Good SDC )
Syntactically adequate - at least for a subset of
FT strategies (Average SA )
The meta-program is written in a fixed,
pre-existing language (CSP)
Inefficient implementation (huge performance
overhead for switching execution modes)
No adaptability (Bad AD )

54
Summary

No optimal solution exists yet

Challenging research problem!

55
Conclusions in search of optimum

A dependable service is one that persists even
when, for instance, its corresponding program
experiences faults to some agreed upon extent
An F-dependable service (resp. F-dependable
program, system) is one that persists despite
the occurrence of faults as described in F
F is the fault model

56
Conclusions in search of optimum

F is the model of an environment (E)
An F-dependable service may tolerate faults in E
and may not for those in E
What if F matches an environment E?
What if E changes into E?
What if an F-service is moved?
A failure may occur!

57
Conclusions in search of optimum

Adapting services
X-dependable services, where X f(E)
X changes when
The service is moved
The environment mutates
Changes should occur automatgically (High AD)
The expression of adaptability and dependability
concerns should not increase complexity too
much (High SA )

58
Conclusions