Bran Selic Rational Software Canada bselic@rational.com - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Bran Selic Rational Software Canada bselic@rational.com

Description:

By focussing on the imperfect world of physical reality we may miss the essence ... and will have stringent dependability requirements ('cannot reboot the Internet' ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 42

Provided by: bran190

Category:

more less

Transcript and Presenter's Notes

Title: Bran Selic Rational Software Canada bselic@rational.com

1
Physical Programming Beyond Mere Logic

Bran SelicRational Software Canadabselic_at_rationa
l.com

2
What I am Hoping For
E THEORY AND PRACTICE OF
SOFTWARE
3
The Ideal and the Real

By focussing on the imperfect world of physical
reality we may miss the essence

Software seems much closer to the ideal world

4
The Software World

Fundamental design principle separate program
logic from the underlying implementation
technology
separation of concerns
software portability

Program Logic
HL ProgrammingLanguages
Computing Environment Technology
5
The Real-Time Software World

Key question How long will it take?
The quantitative characteristics of the computing
environment encroach upon the purity of the logic
software design involves engineering tradeoffs

6
A Simple Programming Application

Traverse a transactions log database and print
all transactions pertaining to a specific account

open (DB) for i 1 to DB.size do record
read (DB) if (record.acctNo
myAccount)then print (record) enddo close
(DB)
7
Porting to a Distributed Environment

Can it really be this simple?

Network
open (DB)for i 1 to DB.size do record
read (DB) if (record.acctNo
myAccount)then print (record) enddoclose
(DB)
RPC_open (DB)for i 1 to DB.size do record
RPC_read (DB) if (record.acctNo
myAccount)then print (record) enddoRPC_close
(DB)
8
Some (Unstated!) Assumptions

The CPU and database are fast enough for the
needs of the application
e.g. random access database hardware
The CPU and database fail as a unit
i.e., no need to contend with failures of the
database
Communications is reliable
order preserving
exactly once semantics
A system never has anything more important to do
than what it is doing at the moment

9
Partial Failures

Distributed systems can exhibit partial failures
fault tolerance ability to recover from partial
failures
Issue failure recovery strategy
fault detection
failure recovery
fault diagnosis
Issue how do other sites detect that a site has
failed?
(apparent) lack of activity/response
how do we distinguish between a failed site and a
lost message?
Timeout is the only general mechanism available
how long do we wait?
Tradeoff between responsiveness vs. degree of
certainty

10
A More Realistic Distribution Scenario

Dealing with partial failures

DB locate_database (Network)exception abort
RPC_open (DB)exception do DB
locate_database (Network)exception abort
enddo for i 1 to DB.size do record
RPC_read (DB)exception do DB
locate_database (Network)exception abort for
j 1 to (i-1) do RPC_read (DB)
exception abort retry enddo if
(record.acctNo myAccount)then print
(record) enddo RPC_close (DB)
Most of the code is in the exception handlers!
11
Asynchronous Events and Fault Tolerance

Partial system failures are only one kind of
event that may need to be handled in the course
of execution of a distributed program
Others
high-priority situations (e.g., imminent
deadlines)
aborts
These events are often unpredictable
may occur at any point in the execution of a
program
fault tolerance requires that whenever they occur
and whatever they are, we need to deal with them

12
Revisiting An Old Assumption

Is the traditional main path focussed
programming style appropriate when exceptions are
the rule?

13
Asynchronous Event Handling

This is nicely captured by the state-event matrix
of finite state machines

Event A
etc.
Event S
Handler AN
Handler AN1
Handler AN2
14
A Conclusion

In an event-driven and deadline-based
application, a state machine-based programming
model may be more appropriate than the
traditional algorithmic (main path) programming
model
The environment strikes back
the program logic is strongly affected by the
environment

15
Communication Media Failures

Message loss
due to hardware failures
due to software failures (e.g., buffer overflow)
Message reordering
due to different paths
due to variable delays (e.g., due to variable
message lengths)
retransmission due to fault-tolerant protocols
Message duplication
due to faulty hardware
retransmission due to fault-tolerant protocols

16
Transmission Delays

Possibility of out of date status information

17
Relativistic Effects

Relativistic effects
different observers see different event orderings
(due to different and variable transmission
delays)

18
Distribution Transparencies

Providing supporting layers of functionality that
shield the application from the undesirable
effects of distribution
e.g., reliable communication protocols

client
server
19
Impossibility Result No.1

It is not possible to guarantee that agreement
can be reached in finite time over an
asynchronous communication medium, if the medium
is lossy or one of the distributed sites can fail
Fischer, M., N. Lynch, and M. Paterson,
Impossibility of Distributed Consensus with One
Faulty Process Journal of the ACM, (32, 2) April
1985.

20
Impossibility Result No.2

Even when communication is fully reliable, it is
not possible to guarantee common knowledge if
communication delays are unbounded
Halpern, J.Y, and Moses, Y., Knowledge and
common knowledge in a distributed environment
Journal of the ACM, (37, 3) 1990.

21
The End-To-End Argument

Transparency mechanisms are intended to protect
the application from observing the undesirable
effects of distribution
Most transparency types require distributed
agreement!
The end-to-end argument Saltzer et al.
if transparency cannot be guaranteed, the
application is not really shielded from the
effects of distribution
the overhead of introducing transparency
mechanisms may not be justified

22
Stepping Back...

Most distribution problems are a consequence of
the encroachment of the physical world into the
pliable and limitless logical world of software
the problem is fundamental (e.g., the end-to-end
argument)
Traditional Programming Logic
Physical Programming Logic Physics
like traditional engineers, software designers
must take into account the raw material out of
which they spin their logic
finite resources, finite delays, finite
reliability...

23
Quality of Service Concepts

The physical characteristics of software can be
specified using the general notion of Quality of
Service (QoS)
a specification of how well a service is (to be)
performed
e.g. throughput, capacity, response time
usually a quantitative measure
QoS specifications are two sided
offered QoS the QoS that is offered to clients
required QoS the QoS required by a client

24
Resources and Quality of Service

Resource an element whose functional capacity is
limited, directly or indirectly, by the finite
capacities of the underlying physical computing
environment
The services of a resource are characterized by
one or more QoS attributes
capacity, reliability, availability, response
time, etc.

Client
Resource
Resource Demand
OfferedQoS
RequiredQoS
RequiredQoS ? OfferedQoS
25
Simple Example

Concurrent tasks accessing a monitor with known
response time characteristics

Required QoS
Deadline 3 ms
MaxExecutionTime 4 ms
Offered QoS
26
Types and Physical Types

The purpose of types is to tell us about the
externally relevant properties of software
components so that we can validate whether they
are being used appropriately
Physical types type specifications that
incorporate QoS characteristics
Answer two key engineering questions
can this component support the load intended
for it?
what does this component require to support its
offered QoS?

27
Physical Type Example

A semaphore type
class Semaphore
heap 10 bytes -- required QoS
CPU? 5 MIPS -- required QoS
get()proc? 0.4CPU usstack4 bytes
rel()proc? 0.4CPU usstack4 bytes
Usage
mySema Semaphore
mySema.get() proc? 3 us -- req. QoS

28
Violation of Encapsulation?

Arent the offered QoS characteristics a
consequence of the implementation?
Not necessarily...
The offered QoS characteristics can and should be
defined independently of the implementation
the worst-case numbers of traditional
engineering
The contractual obligations that the component
designer is willing to assume

29
Physical Type Checking

Can physical types be statically checked?
The good news Yes, they can (in most cases)
The bad news typically requires complex analysis
methods (queueing network analysis,
schedulability analysis, etc.)
but then, model checking and theorem proving is
not simple either
Some issues
Typically, QoS-based analyses cannot be done
incrementally -- the full system context is
required
but then, the same holds for many formal
verification methods
Each type of QoS (e.g., bandwidth, CPU
performance) combines differently

30
Required QoS

Like all guarantees, the offered QoS is
contingent on the component getting what it needs
to do its job
There are two distinct dimensions to this
the peer dimension
the layering dimension

31
Logical Viewpoint

Example logical view of aircraft simulator
software

INSTRUCTOR STATION
AIRFRAME
ATMOSPHEREMODEL
PILOT CONTROLS
CONTROLSURFACES
GROUNDMODEL
ENGINES
32
Engineering (Realization) Viewpoint

The realization of a specific set of logical
components using facilities of the run-time
environment

33
Viewpoints and Mappings
Realizationmappings
34
The Engineering Viewpoint

The engineering viewpoint represents the raw
material out of which we construct the logical
viewpoint
the quality of the outcome is only as good as the
quality of the ingredients that are put in
as in all true engineering, the quantitative
aspects of the logical model are often crucial
(How long will it take? How much will be
required?)

35
Distributed Systems Dilemma

Dilemma How can we account for the engineering
characteristics of the system without prematurely
and possibly unnecessarily committing to a
specific technology?
Proposed solution Include in the logical model a
generic (technology-neutral) specification of the
required/expected characteristics of the
engineering environment

36
Viewpoint Separation

Required Environment a technology-neutral
environment specification required by the logical
elements of a model

Logical Viewpoint
37
Required Environment Specifications

What a logical component needs in order to
perform its function according to spec

realization mapping
38
Required Environment Partitions

Logical elements often share common QoS
requirements

QoS domain (e.g.,failure unit, uniform comm
properties)
39
QoS Domains

Specify a domain in which certain QoS values
apply throughout
failure characteristics (failure modes,
availability, reliability)
CPU speeds
communications characteristics (delay,
throughput, capacity)
etc.
The QoS values of a domain can be compared
against those of a concrete engineering
environment to see if a given environment is
adequate for a specific model

40
Physical Programming

The notions of QoS and QoS domains enable the
design of distributed systems that properly
account for the effects of distribution and other
non-transparent physical phenomena, while
allowing for a high degree of portability and
technology independence
They are also the basis for formal verification
of realization mappings
required QoS ? QoS of the proposed engineering
environment
May also be used to automatically synthesize
engineering environments that satisfy a given QoS
specification of a logical model

41
Conclusions and an Appeal...

The physical aspects of software will not go away
ignoring them can be perilous especially when
working with distributed systems
most interesting software systems of the future
will be distributed and will have stringent
dependability requirements (cannot reboot the
Internet)
What is needed is a proper theoretical framework
for dealing with physical types
The QoS framework described here is currently
being incorporated into a profile of UML for
real-time applications