Seminarie Informatica - PowerPoint PPT Presentation

About This Presentation
Title:

Seminarie Informatica

Description:

The lighter the color, the more complex. the problem of. expressing fault tolerance ... e.g., to delay a request until it is acceptable, i.e., until a guard is met ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 60
Provided by: vincenzo7
Category:

less

Transcript and Presenter's Notes

Title: Seminarie Informatica


1
Seminarie Informatica
  • Fault-tolerant Systems The Software Viewpoint

A series of seminars coordinated byVincenzo De
Florio http//www.pats.ua.ac.be
2
The matter
  • The exam
  • The topics
  • This lecture
  • Application-level fault tolerance provisions

3
Introduction to the exam
  • Seminarie informatica
  • 10 seminars on hot topics of computer science
  • Topic of this cycle software fault-tolerant
    systems
  • Next 3 seminars 15, 22 November 6 December
  • Next year seminars to be announced on
    http//www.win.ua.ac.be/vincenz/si/0607.html

4
Introduction to the exam
  • Oral discussion of 2 papers
  • A 56 page paper based on one or more of the
    topics of the seminars
  • A paper with the analysis of a case study
  • See later for examples
  • Evaluation criteria
  • Do the papers contain original ideas? Do they
    follow too strictly the seminar?
  • Does the author understand the subject? Is (s)he
    able to reason independently about the subject?
  • Papers must be submitted by May 15, 2007
  • E-mail to vincenzo.deflorio_at_ua.ac.be

5
The Topics
Dependability the property of a system such
that reliance can justifiably be placed on the
service it delivers
Fault tolerance one of the means of
dependability
6
The Dependability Tree
7
Fault tolerance (FT)
Fault-tolerant system is system that continues
to function in spite of faults
defect IC
bug in program
operation fault
sensor drift
8
Attributes of dependability
  • Availability
  • Readiness for usage
  • A(t) probability that system is conform to
    specification at time t
  • Reliability
  • Continuity of service
  • R(t) probability that system is conform to
    specifications during t0,t, provided that so it
    is at t0

9
Attributes of dependability (2)
  • Safety
  • Non-occurrence of catastrophic consequences on
    environment
  • S(t) probability that a system is either
    conform to specification, or reaches a safe halt,
    at time t
  • Fail-safe systems

10
Attributes of dependability (3)
  • Maintainability
  • Aptitude to undergo repairs and evolution
  • M(t) probability that system is back to
    specifications at t if failed at t0

11
Attributes of dependability (4)
  • Confidentiality
  • Non-occurrence of unauthorised disclosure of
    information
  • Integrity
  • Non-occurrence of improper alterations of
    information

12
Related attributes
  • Testability
  • Ability to test features of a system
  • Related to maintainability

13
Related attributes
  • Security
  • Integrity availability confidentiality

14
References
  • Jean-Claude Laprie, Dependable Computing and
    Fault Tolerance Concepts and Terminology, in
    Proc. of the 15th Int. Symposium on
    Fault-Tolerant Computing (FTCS-15), Ann Arbor,
    Mich., June 1985, pp.2-11
  • Jean-Claude Laprie, Dependability---Its
    Attributes, Impairments and Means, in
    Predictably Dependable Computing Systems, ESPRIT
    Basic Research Series, B. Randell and J.-C.
    Laprie and H. Kopetz and B. Littlewood (eds.),
    Springer Verlag, 1995, pp. 3-18.

15
The lecture
  • We now focus on application-level fault tolerance
  • Why do we need ALFT? Why do we need software FT
    in the first place?
  • We explain why
  • We survey the existing methods and assess their
    pros and cons against a set of properties
  • Surprising conclusion still an open problem

16
Structure
  • Introduction
  • Identification of main problems to tackle / key
    properties to achieve
  • Qualitative survey
  • Sketch of a possible ideal solution
  • Conclusions

17
Software Fault Tolerance
  • Human society more and more expects
    and relies on good quality of complex
    services supplied by computers

18
Software Fault Tolerance
  • Consequences of a failure in the 40s(Computers
    as fast solvers of numerical problems)
  • Errors in computations, long downtimes
  • Consequences of a failures nowadays(Computers
    controlling nuclear plants, airborne equipment,
    healthcare)
  • Incalculable penalty (catastrophes)

19
Software Fault Tolerance
  • Traditional answer Hardware Fault
    Tolerance
  • This is an important ingredient, but not the
    only one needed today!
  • Complexity is also in the SW layers
  • Hierarchies of complex abstract machines

20
Software Fault Tolerance
  • Complexity is also in SW layers (cont.ed)
  • Software is often networked and distributed
  • Relationships among software components are often
    complex
  • Object model Þ Easier SW reuse Þ Hidden
    explicit Complexity

21
Software Fault Tolerance
  • In conclusion No amount of verification,
    validation and testing can eliminate all faults
    in an application and give complete confidence in
    the availability and data consistency of
    applications
  • Fault tolerance in SW is key
  • SW failures can have the same extent in
    consequences of failures in HW

Ariane 5 !
22
Problems of SW FT
The lighter the color, the more general purpose
the (virtual) machine
The lighter the color, the more complexthe
problem ofexpressing fault tolerance
23
Problems of Application-levelFault Tolerance
  • The only alternative and effective means for
    increasing software reliability is that of
    incorporating in the application software
    provisions for SFT
  • The Application software has to manage
  • Functional aspects
  • Fault tolerance (FT) aspects
  • at the same time / in the same space

24
Problems and properties of Application-level
Fault Tolerance
  • Hazard code intrusion
  • FT provisions are specified side by side with the
    service
  • Conflicting design concerns
  • Overall design complexity gets increased
  • Larger development and maintenance costs times
  • Larger probability of introducing software bugs

25
Problems and properties of Application-level
Fault Tolerance
  • Separation of design concerns ( SDC )
  • In what follows we call an ALFT a means to
    express fault tolerance in the application
    software
  • A criterion to compare ALFTs is by their degree
    of SDC

26
Problems and properties of Application-level
Fault Tolerance
  • Hazard porting code ¹
    porting service
  • FT code assumes fault model f(e)
  • If e changes, or
  • If the code is moved to another environment e
  • the QoS may degrade

27
Problems and properties of Application-level
Fault Tolerance
  • Hazard porting code ¹
    porting service

370 Million Euros in the sink
  • An interesting case Ariane 5 501
  • Ariane 4 missions software re-used inAriane 5
  • The early part of the trajectory of Ariane 5
    differed from that of Ariane 4 and resulted in
    quite higher horizontal velocity values

This could be a case study for the exam
28
Problems and properties of Application-level
Fault Tolerance
  • Problem service portability
  • Porting FT comes not for free
  • Hardwired fault model static environment
  • More difficult to adapt / test / maintain
  • More prone to Ariane 5 - effects

What is the most often overlooked risk in sw
engineering? That the environment will do
something the designer never anticipated

J. Horning
29
Problems and properties of Application-level
Fault Tolerance
  • Adaptability ( AD )
  • Does the ALFT provide means to adapt,
    dynamically, to new environmental conditions?
  • A criterion to compare 2 ALFTs is by their
    degree of AD

30
Problems and properties of Application-level
Fault Tolerance
  • Problem adding complexity can decrease the
    dependability
  • The ALFT (the means to express FT) must be based
    on a simple strategy
  • It must be syntactically adequate to host several
    mechanisms

31
Problems and properties of Application-level
Fault Tolerance
  • Hazard
  • Languages shape the way we think Warf
  • If all you have is a hammer, everything looks
    like a nail
    /usr/share/fortune
  • but is it really a nail?
  • Syntactical Adequacy ( SA )
  • Does the ALFT provide simple means to host many
    FT solutions?
  • A criterion to compare 2 ALFTs is by their
    degree of SA

32
Summary
  • Separation of design concerns ( SDC )
  • Adaptability ( AD )
  • Syntactical Adequacy ( SA )
  • A base of attributes we can use to compare
    ALFTs with one another

33
System structures for SFT
  • Single-version FT
  • Multiple-version FT
  • Object model
  • Linda Model
  • FT Languages
  • Recovery metaprogram

Each of these could be a case study for the exam
34
Single-version Fault Tolerance
  • Single-version SFT embedding in the user
    application of a simplex system a set of error
    detection / recovery features
  • Explicit code intrusion (bad SDC )
  • Increases size and complexity (bad SA )
  • Bad for transparency, maintainability,
    portability
  • Increases development times and costs
  • No support for dynamic adaptability (bad AD )
  • Libraries
  • SwIFT, HATS, EFTOS

35
Multiple-version Fault Tolerance
  • Multiple-version SFT NVP and RB
  • Idea redundancy of software independently
    designed versions of software
  • Randell (1975) All fault tolerance must be
    based on the provision of useful redundancy, both
    for error detection and error recovery. In
    software the redundancy required is not simple
    replication of programs but redundancy of design
  • Assumption random component failures. Correlated
    failures Þ sudden exhaustion of available
    redundancy
  • Again, Ariane 5 flight 501 two crucial
    components were operating in parallel with
    identical hardware and software

36
Multiple-version Fault Tolerance
include ltftmacros.hgt ...
ENSURE(acceptance-test) Alternate
1 ELSEBY Alternate 2
... ENSURE
37
Multiple-version Fault Tolerance
include ltftmacros.hgt ... NVP
VERSION block 1 SENDVOTE(v-pointer, v-size)
VERSION block 2
SENDVOTE(v-pointer, v-size)
ENDVERSION(timeout, v-size) if
(!agreeon(v-pointer)) error_handler()
ENDNVP
38
Multiple-version Fault Tolerance
  • Multiple-version SFT
  • Implies N-fold design costs, N-fold maintenance
    costs
  • The risk of correlated failures is not negligible
  • Code intrusion is limited (Acceptable SDC )
  • System structure is fixed (Bad SA )
  • No support for dynamic adaptability (bad AD )
  • Can be combined with other means

39
Object-centred Strategies
  • Strategies based on the object model
  • Metaobject protocols and reflection
  • Open implementation of the run-time executive of
    an OO-language
  • Reflection, reification
  • Composition filters
  • Each object has a set of filters. Messages sent
    to any object are trapped by its filters. These
    filters possibly manipulate the message before
    passing it to the object.

40
Object-centred Strategies
  • Active objects
  • Objects that have control over the
    synchronisation of incoming requests from other
    objects. Objects can autonomously decide, e.g.,
    to delay a request until it is acceptable, i.e.,
    until a guard is met
  • FRIENDS, SINA, Correlate
  • Full separation of design concerns (Good SDC )
  • No code intrusion
  • Syntactically adequate - at least for a subset of
    FT strategies (Acceptable SA )

41
Object-centred Strategies
  • Assumption application written in extended
    OO-language
  • Adaptability? (Questionable AD )

42
FT Linda Systems
  • Generative communication - messages are not
    sent, they are stored in a public, distributed
    shared memory
  • A shared relational database for storing and
    withdrawing tuples
  • Tuples lists of objects identified by their
    contents, cardinality and type
  • A Linda process inserts, reads, and withdraws
    tuples via blocking or non-blocking primitives
  • Synchronisation presence / absence of a matching
    tuple

43
Linda
  • In master-worker applications
  • Dynamic load balancing, also in heterogeneous
    clusters
  • Inherently tolerates crash failures of workers
  • Single-op atomicity
  • Solutions
  • Atomic transactions with multiple TS ops
  • Stable tuple space
  • Tuple space checkpointing, etc.

Possible case study for the exam
44
Linda
  • FT-Linda, Persistent Linda...
  • Full separation of design concerns (Good SDC )
  • No code intrusion
  • Syntactically adequate - at least for a subset of
    FT strategies (Acceptable SA )
  • Assumption application written in Linda
  • Adaptability? (Questionable AD )

45
FT Languages
  • FT Languages
  • Enhanced, pre-existing
  • Examples
  • FT-SR
  • Fail-stop modules - abstract unit of
    encapsulation
  • Atomic execution
  • Composability
  • x-Linda (x C, Fortran, C, )

46
FT Languages
  • FT Languages
  • Novel languages
  • Examples
  • Argus distributed OO programming language and
    operating system
  • Guardians objects performing user-definable
    actions in response to remote requests
  • Atomic transactions
  • FTAG functional language based on attribute
    grammars

47
FT Languages
  • FTAG
  • Computation collection of pure mathematical
    functions, the modules.
  • Each module has a set of input values, called
    inherited attributes, and of output variables,
    called synthesized attributes.

48
FTAG (cont.d)
  • Primitive modules can be executed
  • Non-primitive modules require other modules to be
    performed first
  • FTAG program decomposing a root module into
    its basic sub-modules and then applying
    recursively this decomposition process to each of
    the sub-modules (computation tree)

49
FTAG (cont.d)
  • Natural support for redoing (replacing a portion
    of the computation tree with a new computation)
  • Natural support for replication (replicated
    decomposition a module is decomposed into N
    identical sub-modules implementing the function
    to replicate)

50
FT Languages
  • Conclusions for FT languages
  • adequate separation of design concerns,
    transparency (good SDC )
  • special purpose syntax (potentially good SA )
  • application must be written with non standard
    language
  • bad portability
  • Adaptability ( AD ) unknown

51
RMP
  • Recovery Metaprogram
  • Two cooperating processing contexts
  • User-placed breakpoints in the user context bring
    to the execution of a meta-program
  • When the meta-program ends, control is returned
    to the user program
  • Meta-program is to be written in CSP

52
RMP
  • Adequate, e.g., for recovery blocks
  • Breakpoint can trigger the execution of
  • CHECKPOINT
  • ALTERNATES
  • ACCEPTANCE TESTS...

53
RMP
  • RMP summary
  • Full separation of design concerns
  • No code intrusion (Good SDC )
  • Syntactically adequate - at least for a subset of
    FT strategies (Average SA )
  • The meta-program is written in a fixed,
    pre-existing language (CSP)
  • Inefficient implementation (huge performance
    overhead for switching execution modes)
  • No adaptability (Bad AD )

54
Summary
  • No optimal solution exists yet
  • Challenging research problem!

55
Conclusions in search of optimum
  • A dependable service is one that persists even
    when, for instance, its corresponding program
    experiences faults to some agreed upon extent
  • An F-dependable service (resp. F-dependable
    program, system) is one that persists despite
    the occurrence of faults as described in F
  • F is the fault model

56
Conclusions in search of optimum
  • F is the model of an environment (E)
  • An F-dependable service may tolerate faults in E
    and may not for those in E
  • What if F matches an environment E?
  • What if E changes into E?
  • What if an F-service is moved?
  • A failure may occur!

57
Conclusions in search of optimum
  • Adapting services
  • X-dependable services, where X f(E)
  • X changes when
  • The service is moved
  • The environment mutates
  • Changes should occur automatgically (High AD)
  • The expression of adaptability and dependability
    concerns should not increase complexity too
    much (High SA )

58
Conclusions
  • Ideally, the code should be made of two
    components
  • (service, FT)
  • (Optimal SDC )
  • and FT should adapt dynamically w.r.t. e

59
Conclusions
  • Risks this may call for complexity!
  • But generic architectures can be thought so as to
    go for a limited complexity
  • Optimizations are possible
  • In a future seminar a compliant architecture
    that is being designed within PATS

60
  • Questions?

All citations by B. Randell if no author is
specified
Write a Comment
User Comments (0)
About PowerShow.com