Acceptability-Oriented Computing - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Acceptability-Oriented Computing

Description:

Basic properties that any execution must satisfy to be acceptable ... Aspiring to Perfection Recognized as Harmful. Defocuses development effort ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 95
Provided by: martin49
Category:

less

Transcript and Presenter's Notes

Title: Acceptability-Oriented Computing


1
Acceptability-Oriented Computing
  • Martin Rinard
  • Laboratory for Computer Science
  • Massachusetts Institute of Technology

2
Traditional View of Correctness
Execution Space
3
Traditional View of Correctness
Correct Execution
Execution Space
4
Acceptability View
Acceptability Envelope
Correct Execution
Execution Space
5
Acceptability View
Acceptability Envelope
Correct Execution
Acceptable Executions
Execution Space
6
Acceptability View
Acceptability Envelope
Correct Execution
Acceptable Executions
Unacceptable Execution
Execution Space
7
Acceptable Execution
Acceptability Envelope
Correct Execution
Execution Space
8
Fail Stop Execution
Acceptability Envelope
Correct Execution
STOP
Execution Space
9
Safe Exit Execution
Acceptability Envelope
Correct Execution
Safe Exit Point
STOP
Execution Space
10
Resilient Computing Execution
Acceptability Envelope
Correct Execution
Repaired Execution
Execution Space
11
Questions
  • How to identify acceptability envelope?
  • Set of acceptability properties
  • Basic properties that any execution must satisfy
    to be acceptable
  • How to ensure program stays within envelope?
  • Acceptability monitoring
  • Acceptability enforcement

12
Resilient Computing Execution
Acceptability Envelope
Correct Execution
Repaired Execution
Acceptability Monitoring
Acceptability Enforcement
Execution Space
13
Proposed Structure
Inputs
Outputs
Core System
14
Proposed Structure
Inputs
Outputs
Core System
Output Filter
15
Proposed Structure
Inputs
Outputs
Core System
Input Filter
Output Filter
16
Proposed Structure
Outputs
Inputs
Core System
Input Filter
Output Filter
Data Structure Repair
17
Proposed Structure
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Data Structure Repair
18
Proposed Structure
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Data Structure Repair
19
Proposed Structure
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Exception Recovery
Data Structure Repair
20
Proposed Structure
Response Enforcement
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Exception Recovery
Data Structure Repair
21
Monitoring and Enforcement Mechanisms
  • Black Box
  • Do not affect core
  • Input/output filters and correlators
  • White Box New code and data into core
  • Gray Box
  • No change to core program
  • Can change data structures and control flow
  • Mechanisms
  • Procedure call and system call interception
  • Ptrace interface, mmap to access address space

22
Reason for Acceptability-Oriented
ComputingDifficulty of Delivering Perfect
Software
  • Difficulty in all areas of development effort
  • Understanding domain, obtaining requirements
  • Producing specification, developing software
  • Change Aspiration of Development Process
  • Accept inevitability of imperfection
  • Goal is to deliver acceptable program
  • Augment Development Activities
  • Identify crucial acceptability properties
  • Ensure that program does not violate them

23
Aspiring to Perfection Recognized as Harmful
  • Defocuses development effort
  • All parts seen as equally important
  • No formal way to direct development effort to
    most important parts of code
  • Produces brittle structure
  • Each piece of functionality implemented
  • Once (no redundancy)
  • Completely (hard and easy parts together)
  • No recovery or protection mechanisms
  • Program completely vulnerable to any error

24
Advantages of Acceptability-Oriented Computing
  • Focused, prioritized development effort
  • Appropriately direct engineering activities
  • Ensure satisfaction of acceptability properties
  • Resilient software structure
  • Redundant acceptability property enforcement
  • Mechanisms enforce partial properties
  • Simpler (easier to obtain acceptability) than
    complete modules in core software
  • Resulting software structure tolerates errors

25
Ideal Result
  • Can build systems with less development effort
  • Can reduce testing effort for core
  • Can leave (infrequent) errors in system
  • Can build systems with more functionality
  • Can invest saved development effort on increasing
    functionality of system
  • Can make larger system stable
  • Can use more aggressive, riskier algorithms

26
Map Example
Outputs
Inputs
put x 10
Map Core
put y 12
put z 11
get y
rem z
Acceptability Property Output must be within min
and max inputs
27
Map Example
Outputs
Inputs
put x 10
Map Core
put y 12
put z 11
get y
rem z
Acceptability Property Output must be within min
and max inputs
28
Unacceptable Output
Inputs
Outputs
put x 10
10
Map Core
put y 11
Unacceptable Output
11
rem y
11
put x 12
12
rem x
12
get x
2
29
Input/Output Correlation
Inputs
Outputs
put x 10
10
Map Core
put y 11
11
rem y
11
put x 12
12
rem x
12
2
get x
Input Monitor
Output Filter
Input/Output Correlator Min Max
30
Input/Output Correlation
Inputs
Outputs
put x 10
10
put x 10
Map Core
put y 11
11
put y 11
rem y
11
rem y
put x 12
12
put x 12
rem x
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
31
Input/Output Correlation
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
32
First Option Shut Down System
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
33
Second Option Return Error Code
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
0
get x
get x
Input Monitor
Output Filter
Error Code
Input/Output Correlator Min 10 Max 12
34
Third Option Return Min or Max Value
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
10
get x
get x
Input Monitor
Output Filter
Min Value
Input/Output Correlator Min 10 Max 12
35
When to Use Each Option
  • Shut down system when
  • It is safe and acceptable
  • External intervention is available
  • Return error code when
  • Client is able to deal with error code
  • Return min or max when
  • Not safe to shut down system
  • No external intervention available
  • Client not prepared to deal with error code

Safe Exit
Delegation
Resilient Computing
All options use block box mechanism
36
Implementation Approach
Hash Table
a
e
i
AcceptabilityProperty
1
7
11
b
  • Each entry has exactly one incoming reference
  • From table, table entry, or free list
  • Implies no cycles in table or free list
  • Implies disjointness of table and free list

3
d
h
4
10
Free List
37
Checking for Acceptability Violations
  • Auxiliary reference count for each entry
  • Traverse data structures to compute counts
  • Check that no count greater than one
  • Complications
  • Invalid pointers (addressing violations)
  • Out of bounds array indices
    (more addressing violations)
  • Cycles (infinite traversal loops)

38
Mechanisms for Accessing Data Structures
  • White Box
  • Link monitor and checking code into core
  • Possibility of core corrupting checker
    (and vice-versa!)
  • Gray Box
  • Checker uses ptrace interface (or mmap)
  • More cumbersome to access data structures
  • But checker isolated from core

39
Inconsistency Responses
  • Fail stop halt program, await intervention
  • Feasible when halting acceptable
  • And intervention practical
  • May actually decrease reliability
  • Delegation return error code to client
  • Feasible when client can deal with error
  • Resilient computing fix inconsistency, continue
  • Enables continued (acceptable) execution
  • Hides effect of inconsistency from clients

40
Code for Put Procedure in Map Example
  • int tableM
  • int freelist
  • put(n, v)
  • e alloc()
  • value(e) v
  • strcpy(name(e), n)
  • p find(n)
  • if (p ! NOENTRY) free(p)
  • b bin(n)
  • next(e) tableb
  • tableb e
  • return(v)
  • free(e)
  • value(e) freelist
  • freelist e

Hash table and free list
Allocate and initialize new hash table entry
Free old entry with same name
Insert new entry into hash table
Insert entry into free list
41
Code for Put Procedure in Map Example
  • int tableM
  • int freelist
  • put(n, v)
  • e alloc()
  • value(e) v
  • strcpy(name(e), n)
  • p find(n)
  • if (p ! NOENTRY) free(p)
  • b bin(n)
  • next(e) tableb
  • tableb e
  • return(v)
  • free(e)
  • value(e) freelist
  • freelist e

Hash table and free list
Does not check for empty free list
Allocate and initialize new hash table entry
Free old entry with same name
Leaves entry in table
Insert new entry into hash table
Creates cycle if entry already in table
Insert entry into free list
42
Problem
  • Program crashes if free list empty when call put

New Acceptability Property
Free list is not empty
Acceptability Enforcement
Repair algorithm ensures free list not empty
43
Data Structure Repair Goal
All References Valid
Invalid References
Map Core
Map Core
Cycle
No Cycles
Empty Free List
Entries in Free List
44
Enforcing Consistency
  • Hand-coded consistency algorithm
  • Coding is difficult because must assume data
    structures can be arbitrarily corrupted
  • Invalid references, out of bounds indices
  • Cycles (can cause infinite loops in repair code)
  • Two data structure traversals
  • First eliminates invalid references and indices
  • Second removes all but first reference to each
    entry (requires auxiliary marking data structure)
  • Reconstruct free list
  • Any unreferenced entry put into list
  • If free list still empty, steal entry from table

45
Issues
  • Replace failure with potentially suboptimal (but
    still acceptable) execution
  • Checking overhead
  • Depends on properties and application
  • Subject to optimization
  • Obscured errors
  • Record violations and updates in logs
  • Use logs to reconstruct actions
  • Potential errors in checking and repair code
  • Acceptability enforcement code deals with simpler
    properties than core
  • Should be simpler and easier to get correct

46
Generalizations
  • Process structure consistency
  • System structured as collection of processes
  • Monitor and regenerate processes to preserve
    consistency properties
  • System configuration consistency
  • Difficult to get configuration settings correct
  • Monitor and update to satisfy properties
  • Properties may depend on running applications,
    attached devices, etc.
  • Both involve structural properties

47
Next Problem
  • int tableM
  • int freelist
  • put(n, v)
  • e alloc()
  • value(e) v
  • strcpy(name(e), n)
  • p find(n)
  • if (p ! NOENTRY) free(p)
  • b bin(n)
  • next(e) tableb
  • tableb e
  • return(v)
  • free(e)
  • value(e) freelist
  • freelist e

Buffer Overrun
48
Long Inputs Crash Core
Inputs
Outputs
put x 10
10
put y 11
11
rem y
Map Core
11
put xxxxxxxxxxx 12
rem x
get xxxxxxxxxxx
49
Long Inputs Crash Core
Inputs
Outputs
put x 10
10
put y 11
11
rem y
Map Core
11
put xxxxxxxxxxx 12
rem x
get xxxxxxxxxxx
50
Long Inputs Crash Core
Inputs
Outputs
put x 10
put x 10
10
put y 11
put y 11
11
rem y
rem y
Map Core
11
put xxxxxxxxxxx 12
put xxx 12
12
rem x
rem x
10
get xxxxxxxxxxx
get xxx
12
Truncating Input Filter
51
Classification of Techniques
  • Acceptability properties can involve
  • Inputs, outputs, state, behavior, timing
  • In any combination
  • Examples
  • Use data structures to filter outputs
  • Use inputs to repair data structures
  • Process structure and configuration consistency
  • Timing constraints
  • Input arrivals and triggered program actions
  • Frequency of output events

52
Examples from Real Systems
53
5ESS Switch
Both systems use hand-coded data structure repair
IBM MVS OS
54
  • Maintenance Commands
    fsck(1M)
  • NAME
  • fsck - check and repair file systems
  • SYNOPSIS
  • fsck -F FSType -m -V
    special ...
  • fsck -F FSType -n N y Y
    -V
  • -o FSType-specific-options special ...
  • DESCRIPTION
  • fsck audits and interactively repairs
    inconsistent file
  • system conditions. If the file system is
    inconsistent the
  • default action for each correction is to
    wait for the user
  • to respond yes or no. If the user
    does not have write
  • permission fsck defaults to a no action.
    Some corrective
  • actions will result in loss of data. The
    amount and sever-
  • ity of data loss may be determined from the
    diagnostic out-

55
GPS Wide Area Augmentation System
56
GPS Wide Area Augmentation System
Validates Results
57
Ray Tracing Graphics Computations
Scene Composed Of Triangles
58
Ray Tracing Graphics Computations
Scene Composed Of Triangles
Shoot Rays Into Scene
59
Ray Tracing Graphics Computations
Normal Vectors
Shoot Rays Into Scene Compute How They Interact
with Triangles
60
Ray Tracing Graphics Computations
Normal Vectors
Shoot Rays Into Scene
Degenerate Triangle (colinear vertices) Normal
vector computation fails
61
Acceptability-Oriented Approach
  • Do not code up all degenerate cases
  • Failed computation generates a signal
  • Catch signal
  • Generate some likely value
  • Continue with that value
  • Result
  • Several pixels are incorrect
  • But picture as a whole looks fine
  • Program simpler and works faster

62
Sample Images
T. Kay and J. Kajiya Ray Tracing Complex
Scenes SIGGRAPH 1986
63
Limp-Home Modes in Engine Controllers
64
Hardware Interlocks
Interlocks Prevent Unsafe entry into enclosure
while the bank is energized or not grounded
Unsafe operation of air-disconnect while vacuum
switches are closed Unsafe operation of ground
switch(s) while air-disconnect is closed
65
No Hardware Interlocks
Therac-25
66
Common Theme
  • Presence of acceptability-oriented features
    reduces need for perfection
  • Safety-critical systems
  • Persistent data
  • Can have more ambitious core
  • More functionality
  • More aggressive, riskier algorithms
  • Can tolerate algorithms with known errors
  • Less development effort
  • Less testing and certification
  • Can leave infrequent errors in system

67
Two Kinds of Acceptability-Oriented Computing
  • Opportunistic acceptability-oriented computing
  • Observe acceptability problem
  • Develop acceptability enforcement mechanism
    specifically for that problem
  • Systematic acceptability-oriented computing
  • Identify acceptability properties during
    requirements analysis and design
  • Integrate acceptability features into design
  • Implement acceptability enforcement mechanisms as
    normal development activity

68
Changes to Development Activities
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

69
Changes to Development Activities
  • Problem With Standard Approaches
  • Aspiration of Perfection
  • Flat set of requirements
  • Specification expected to perfectly capture
    requirements
  • Implementation goal produce flawless
    implementation
  • Testing goal eliminate all implementation errors
  • No attempt to
  • Focus on important properties
  • Build resilient system
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

70
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Prioritize Requirements Separate what really
matters for system From what would be nice
to have Foundation of Acceptability-Oriented
Computing Provides basis for acceptability
properties
71
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Translate Prioritized Requirements into
Acceptability Properties External
Properties Inputs and Outputs
72
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Identify Internal Acceptability Properties Data
Structures Implementation How to Integrate
Acceptability Property Enforcement How to
monitor execution How to intervene Black/gray/wh
ite box
73
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Implement and integrate acceptability enforcement
mechanisms
74
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Acceptability enforcement code helps discover and
localize errors Develop, deploy new
acceptability properties as necessary
75
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Turn on appropriate resilient computing
mechanisms Helps system execute acceptably
with minimal external intervention
76
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance

Develop, deploy new acceptability properties as
necessary
77
With Acceptability-Oriented Computing
  • Requirements
  • Specification
  • Design
  • Implementation
  • Testing
  • Deployment
  • Maintenance
  • Potential Adoption Paths
  • Adopt incrementally
  • Start with specific activity
  • Or selected part of system
  • Can stop short of complete adoption if it makes
    sense
  • Adopt in parallel
  • Small acceptability team
  • Most developers oblivious
  • Can orient entire development process around
    acceptability

78
Consequences of Systematic Acceptability-Oriented
Computing
  • Better (more acceptable) software
  • Improved understanding of requirements
  • Inevitable errors placed to minimize harm
  • Resilient systems that recover from errors
  • Better documentation
  • Acceptability properties document what is
    important about the system
  • Acceptability enforcement mechanisms ensure that
    they accurately reflect implementation

79
More Consequences
  • Reduced development and maintenance costs
  • Prioritized engineering effort
  • More aggressive software reuse
  • Reduced testing costs
  • Acceptability properties help tester
  • Simpler testing for acceptability enforcers
  • Can leave infrequent errors in system

80
Continued Execution as an Acceptability Property
  • Failure-Oblivious Computing

81
Why Dont PCs Have Memory With Parity?
Manufacturers Perspective
  • With Parity
  • Memory error occurs
  • PC flags it and stops
  • Consumer blames manufacturer

No Parity Memory error occurs PC oblivious, keeps
going If system crashes, consumer blames Microsoft
No Incentive to Increase Parts Cost to Get
Benefit of Parity
82
Why Dont PCs Have Memory With Parity?
Consumers Perspective
  • With Parity
  • Memory error occurs
  • PC flags it and stops
  • Have to reboot

No Parity Memory error occurs PC oblivious, keeps
going System may never crash (at least,
not because of parity error) If it does crash, no
big surprise
Lack of Parity Increases Reliability!Because It
Makes PC Oblivious to Failure
83
Why Will Java Program Fail?
Out of bounds array access
ai x
x ai
Null pointer dereference
o.f x
x o.f
Standard Response Throw an exception and
terminate the program
84
Why Will Java Program Fail?
Out of bounds array access
ai x
x ai
Null pointer dereference
o.f x
x o.f
Resilient Computing Response Ignore error
and keep executing
Discard Value
Use Manufactured Value
85
Can Extend the Approach to C
  • When program attempts to access illegal address
  • Discard value (writes)
  • Use manufactured value (reads)
  • Program keeps executing
  • Improved version uses a Safe C compiler
  • Catches pointer and array bounds errors
  • Replace exception handler to
  • Discard value (read)
  • Use manufactured (write)
  • Program keeps executing
  • Improvement reduces data structure damage

86
Why Continued Execution is so Valuable
  • Systems often consist of
  • Multiple components
  • Each provides important functionality
  • Artificial coupling between components
  • Components need flow of control to deliver its
    functionality
  • Any error in any component can deny flow of
    control to all others
  • Continued execution enables control to continue
    to flow to each component

87
Why Continued Execution is so Valuable
  • Furthermore
  • Even within a component, error may not cause
    unacceptable execution
  • Or cause of error may eventually be flushed
  • Moral of the story
  • 90 of life is just showing up
  • Keep program showing up

88
More Ways to Ensure Continued Execution
  • Eliminate special-case code
  • Poorly tested, likely to contain errors
  • Not as important as common-case code
  • Locate code that causes errors and remove it
    (garbage collection instance of this idea)

89
Complication Infinite Loops
  • Failure-oblivious techniques can make a
    computation immortal
  • Need a way to identify, then kill useless or
    misguided computations
  • Bound loop iterations
  • Randomize branch and jump targets
  • Speculatively parallelize computation
  • Lack of good mortality units
  • Can attempt to leverage existing structure
    threads, transactions, components,
  • New construct to express mortality units

90
Data Structure Consistency Checks
Application-Specific Error Recovery
Data Structure Repair
Failure-Oblivious Computing
Limp Home Modes
Redundant Computing
Code Excision
Development Process Changes
Hardware Interlocks
Input and Output Filtering
Conservative
Aggressive
Acceptability-Oriented Computing is a Perspective
91
Key Ideas
  • Reject aspiration of perfection
  • Focus on acceptability
  • Acceptability properties identify acceptability
    envelope
  • Acceptability enforcement mechanisms keep system
    within acceptability envelope
  • Opportunistic vs. systematic approaches
  • Ideal result
  • More resilient systems
  • Less development and testing effort

92
Binac Avionics System
93
Example Techniques
  • Filter out unacceptable inputs
  • Truncate strings to eliminate buffer overruns
  • Clamp numeric values within range
  • Use data structures to filter inputs and outputs
  • Use inputs to repair data structures
  • Process structure and configuration consistency
  • Continued execution as acceptability property
  • Failure-oblivious computing
  • Code and input variation

94
Aesthetics
  • AOC about how to get along in world without
    perfection
  • One thing to accept perfection as unattainable
  • Another thing to view aspiration for perfection
    as counterproductive
  • Examples from art that informed thinking
  • Bach (little fugue in G minor) vs. Mahler 2
  • Scale important differentiator
  • Picasso (19 year old perfect picture, cubism)
  • Michelangelo (david, unfinished)
Write a Comment
User Comments (0)
About PowerShow.com