Title: Architectural Support for Dependable System Evolution
1Architectural Support for Dependable System
Evolution
Professor Jie Xu Director of EPSRC WRG e-Science
Centre, UK
School of Computing, University of Leeds, UK
2 Outline
- Architectural Issues with NEC
- Research Challenges
- Fundamental Models
- Service-Oriented Architectures
- Evolution Aspects
- Evaluation of Architectural Properties
- NEC Network-Enabled Capability
3NEC and System Architectures
- We are seeking for innovative approaches for the
delivery of through-life capability in a network
enabled environment - System architectures are the most important
factor that affects both a systems functionality
as well as non-functional aspects such as
dependability and adaptability, and they should
be the starting point of any actual system
development - System architectures have been shown to be
effective in assisting the understanding of
broader system concerns by abstracting away from
details of a system - This is achieved by employing architectural
styles appropriate for describing systems in
terms of components, the interactions between the
components connectors, and the properties that
regulate the composition of components
configurations
4Architectural Aspects for NEC
- To develop a set of architectural frameworks,
representations, and patterns of a system of
systems that support - 1) adventurous ideas but with a focus on NEC
-
-
- 2) through-life management and system
- evolution
- Such support may be formalised as key properties
of the architectures at certain level of
abstraction
To enable the virtualisation of distributed
computational, data and other resources to create
a single, large virtual system, granting users
and applications seamless access to vast IT
capabilities on demand
NEC is The linkage of sensors, decision makers
and weapons systems so that information can be
translated into synchronised military effect at
optimum tempo
5Support for Through-Life Capability Delivery
Changes in capability demand, environmental
conditions etc.
Capability demand
Evolving architecture(s) for organisational and
business capability
Architecture(s) for organisational and business
capability
System
Evolving System
Architecture(s) for operational capability
Evolving architecture(s) for operational
capability
Time
Through-Life System Management and Evolution
6Research Challenges
- Fundamentals for architectures
- Conceptual and abstract (high-level) system
models - Understanding of problem and architectural
contexts - Novel architectures for NEC
- Multiple representations of a system with
multiple levels of abstraction, and architectural
frameworks (AFs), including SOAs, P2Ps and
agent-based - Architectural support for through-life evolution
- Re-configuration evolution and integration
evolution - Coping with dynamic evolution and unknowns
- Evaluation of architectures with respect to
intended support for NEC and through-life
evolution, e.g. architectural options, initial
costs vs future values, benefits of NEC -
7C1 Fundamentals
- Conceptual and abstract models
- Interactions (and interfaces) between components
for NEC - Infrastructure
- Intra-enterprise
- Inter-organisations
- Architecture properties
- Key concepts, building blocks (and protocols) for
defining required architectures
- Recursive view
- Nesting
- Boundaries
Judgemental System
8Conceptual System Model
- Recursive view
- Nesting
- Boundaries
- A system consists of a number of components,
which cooperate under the control of a design to
service the demands of the system environment - The components and the system environment may be
also viewed as systems. The design can be
considered as a special component that defines
the interactions between components and
establishes connections between components and
the system environment
9Dependable Systems
- Dependability is the property of a system such
that reliance can justifiably be placed on the
service the system delivers - A fault-tolerant system is one that is designed
to function dependably despite the effects of
faults (e.g. component faults, software faults,
etc) during normal processing - A system failure occurs when the delivered
service deviates from what the system is aimed at
(e.g. specification) - An error is that part of the system state which
is liable to lead to subsequent failure - A fault is the (hypothesized) cause of an error
10Environmental Faults
Attack
Fault
Fault ? Error ? Failure
Fault
interaction
Environment
System
- Failure of a system may be observed by its
environment - In the case of nesting, a failure of the nested
system is a fault of its enclosing environment,
hence another sequence of Fault ? Error ? Failure - A fault which occurs in the environment of a
system is called an environmental fault it may
cause an error in the system - Indeed, an error in the system could be due to an
intentional fault which is in fact an attack from
the environment
11C2 Novel Architectures for NEC
- NEC requires
- novel architectures that are demand-led,
evolvable, capable of delivering on-demand
services - Service-oriented architectures (SOAs) address
almost all the requirements, which represent some
radical changes in how we design, engineer and
maintain systems - By means of late binding and loose coupling, SOAs
facilitate the virtualisation of various
resources, interoperating between autonomic
systems, and system evolution - Web services provide a standards-based
realisation of this architecture
12Recovery Blocks
ensure acceptance test by primary
alternate else by alternate 2 . else by
alternate n else error
modules - alternates designed
independently adjudicator - the acceptance
test controller - the ensureby... control
structure
13Recovery Blocks Costs vs Values
- Provide fault-tolerant functional components
which allow certain degree of evolution, e.g.
upgrading at the level of modules - Require a suitable mechanism for providing
automatic backward error recovery, e.g. the
recovery cache - The acceptance test is a last line of detecting
errors, supplemented with run-time assertion
statements and hardware error-detection
mechanisms - Recovery blocks can be nested the signalling of
an exception from an inner recovery block will
invoke recovery in the enclosing block - Forward recovery can be incorporated in recovery
blocks to send some compensatory messages to the
environment - Allow the use of functionally degraded alternates
- Case study a failure coverage of over 70 was
achieved with an extra cost of fault tolerance up
to 60
14Enhanced Architecture
- The SCOP architecture allows a greater degree of
adaptability and evolution with additional costs - begin
- i 0 current phase
- stateMark Go
- Si syndrome set
- C one of delivery conditions
- decide(maxPhase)
- while stateMark Go and i lt maxPhase do
- begin
- i i 1
- configure(C, Si-1, Vi) Vi is the set of
active variants - execute(Vi, Si)
- adjudicate(C, Si, stateMark, result)
- end
- if stateMark End then deliver(result) else
signal(failure) - end
15Configuration on the Fly
- Highly dynamic SOAs
- A dynamic pool of services available updated
constantly - Well defined rules and constraints for
configuration based on currently available
services from the pool, perhaps within certain
architectural frameworks - Concrete system architecture is not defined and
perhaps unknown previously - Architecture is derived on the fly in response
to users demand and specific application at the
point in time when needed - Such architecture may be a one-off instance
- Permit a great degree of adaptability and
- smooth evolution a good way to deal with
- unexpected future changes
- But difficult to validate and evaluate
- Trade-offs needed
16C 3 Architectural Mechanisms for Evolution
- Evolvable system architectures
- Re-configuration vs. integration
- Design/develop. time vs. operation time
- Expected changes vs. unknown changes
- (Extremely hard to evaluate against unknowns)
17Problems with Current Practice
System in Operation
Changes in Req., Env., etc.
Requirements
Original architectures lost
Architectural models
NEC delivering as required
?
ad hoc updating
100
NEC System
Evolving NEC System
?
k
?
Unacceptable
0
Designed System Life Time
Design/Development
Designed System Life Time
Time
18Evolving Architectures
System in Operation
Objective
To develop an architecture-driven approach that
ensures a system to deliver the required NEC
through-life (with an acceptably high probability
P)
Changes in Req., Env., etc.
Requirements
Architectural models
Evolving Architectures
NEC delivering as required
100
NEC System
Evolving NEC System
k
Unacceptable
0
Designed System Life Time
Design/Development
Designed System Life Time
Time
19C 4 Evaluation of Architectures
- Measurement and metrics derived from key system
features and attributes in terms of NEC and
through-life evolution - (metrics must truly quantify the characteristics
of the architecture to be measured) - Analytic models and evaluation tools
(architectural options components, I/Fs
connectors, configuration mechanisms) - Experimental validation with case studies (e.g.
experience with the development of CROWN-C)
20Analytic Evaluation of Dependability
- For example, modelling approaches (e.g. Markov,
Petri-net, or fault-tree) for analysing
fault-tolerant architectures by carefully
identifying the ability of various approaches to
tolerate independent and related faults - The results drawn provide designers with rich
information about the fault tolerance properties
of various architectures - They are also useful in uncovering the relative
advantages and disadvantages of different
architectures - Help to improve designs in terms of dependability
21Analytic Evaluation of Design for Changes
- Initial cost value of components, connectors
(I/Fs), and configuration mechanisms - In order to achieve future value while reducing
potential cost of maintenance, upgrading, and
replacement - Identify the complex relationships between
initial costs and future costs while delivering
through-life capability as desired by stakeholders
value
t
Initial cost value cost of upgrading
changes through-life value desired
22Experimental Assessment CROWN-C (1)
- CROWN-C is a production quality Grid system which
features specific enhancements designed to
support the development and assessment of
high-assurance service-oriented systems - The paper left highlights some of the new
dependability and security challenges introduced
by the service-oriented paradigm, and relates
these challenges to the dependability and
security enhancements featured in CROWN-C
23Experimental Assessment CROWN-C (2)
- Dependability vs costs one good module against a
multi-version system based on a similar amount of
resources - CROWN-FIT a fault injection tool to assess
network-level system dependability - Security gains against performance overheads,
using standard security benchmarks
24Conclusions
- Need a better understanding of architectural
issues with NEC and related application
requirements - Inter-organisational factors must be taken into
account and thus handled properly, especially
environmental faults and external attacks - Design decisions and new technique developments
require more realistic models (e.g. those
considering human factors) - Novel architectures and combined forms (e.g.
SOAs, P2Ps, agent-based etc.) - Model-based approaches to system design and
through-life management (e.g. relationships
between initial costs investments and future
values gains)