Title: Acceptability-Oriented Computing
1Acceptability-Oriented Computing
- Martin Rinard
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Traditional View of Correctness
Execution Space
3Traditional View of Correctness
Correct Execution
Execution Space
4Acceptability View
Acceptability Envelope
Correct Execution
Execution Space
5Acceptability View
Acceptability Envelope
Correct Execution
Acceptable Executions
Execution Space
6Acceptability View
Acceptability Envelope
Correct Execution
Acceptable Executions
Unacceptable Execution
Execution Space
7Acceptable Execution
Acceptability Envelope
Correct Execution
Execution Space
8Fail Stop Execution
Acceptability Envelope
Correct Execution
STOP
Execution Space
9Safe Exit Execution
Acceptability Envelope
Correct Execution
Safe Exit Point
STOP
Execution Space
10Resilient Computing Execution
Acceptability Envelope
Correct Execution
Repaired Execution
Execution Space
11Questions
- How to identify acceptability envelope?
- Set of acceptability properties
- Basic properties that any execution must satisfy
to be acceptable - How to ensure program stays within envelope?
- Acceptability monitoring
- Acceptability enforcement
12Resilient Computing Execution
Acceptability Envelope
Correct Execution
Repaired Execution
Acceptability Monitoring
Acceptability Enforcement
Execution Space
13Proposed Structure
Inputs
Outputs
Core System
14Proposed Structure
Inputs
Outputs
Core System
Output Filter
15Proposed Structure
Inputs
Outputs
Core System
Input Filter
Output Filter
16Proposed Structure
Outputs
Inputs
Core System
Input Filter
Output Filter
Data Structure Repair
17Proposed Structure
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Data Structure Repair
18Proposed Structure
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Data Structure Repair
19Proposed Structure
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Exception Recovery
Data Structure Repair
20Proposed Structure
Response Enforcement
Output Rectification
Control Transfer
Outputs
Inputs
Core System
Input Filter
Output Filter
Repair
Probe
Exception Recovery
Data Structure Repair
21Monitoring and Enforcement Mechanisms
- Black Box
- Do not affect core
- Input/output filters and correlators
- White Box New code and data into core
- Gray Box
- No change to core program
- Can change data structures and control flow
- Mechanisms
- Procedure call and system call interception
- Ptrace interface, mmap to access address space
22Reason for Acceptability-Oriented
ComputingDifficulty of Delivering Perfect
Software
- Difficulty in all areas of development effort
- Understanding domain, obtaining requirements
- Producing specification, developing software
- Change Aspiration of Development Process
- Accept inevitability of imperfection
- Goal is to deliver acceptable program
- Augment Development Activities
- Identify crucial acceptability properties
- Ensure that program does not violate them
23Aspiring to Perfection Recognized as Harmful
- Defocuses development effort
- All parts seen as equally important
- No formal way to direct development effort to
most important parts of code - Produces brittle structure
- Each piece of functionality implemented
- Once (no redundancy)
- Completely (hard and easy parts together)
- No recovery or protection mechanisms
- Program completely vulnerable to any error
24Advantages of Acceptability-Oriented Computing
- Focused, prioritized development effort
- Appropriately direct engineering activities
- Ensure satisfaction of acceptability properties
- Resilient software structure
- Redundant acceptability property enforcement
- Mechanisms enforce partial properties
- Simpler (easier to obtain acceptability) than
complete modules in core software - Resulting software structure tolerates errors
25Ideal Result
- Can build systems with less development effort
- Can reduce testing effort for core
- Can leave (infrequent) errors in system
- Can build systems with more functionality
- Can invest saved development effort on increasing
functionality of system - Can make larger system stable
- Can use more aggressive, riskier algorithms
26Map Example
Outputs
Inputs
put x 10
Map Core
put y 12
put z 11
get y
rem z
Acceptability Property Output must be within min
and max inputs
27Map Example
Outputs
Inputs
put x 10
Map Core
put y 12
put z 11
get y
rem z
Acceptability Property Output must be within min
and max inputs
28Unacceptable Output
Inputs
Outputs
put x 10
10
Map Core
put y 11
Unacceptable Output
11
rem y
11
put x 12
12
rem x
12
get x
2
29Input/Output Correlation
Inputs
Outputs
put x 10
10
Map Core
put y 11
11
rem y
11
put x 12
12
rem x
12
2
get x
Input Monitor
Output Filter
Input/Output Correlator Min Max
30Input/Output Correlation
Inputs
Outputs
put x 10
10
put x 10
Map Core
put y 11
11
put y 11
rem y
11
rem y
put x 12
12
put x 12
rem x
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
31Input/Output Correlation
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
32First Option Shut Down System
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
get x
get x
Input Monitor
Output Filter
Input/Output Correlator Min 10 Max 12
33Second Option Return Error Code
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
0
get x
get x
Input Monitor
Output Filter
Error Code
Input/Output Correlator Min 10 Max 12
34Third Option Return Min or Max Value
Inputs
Outputs
put x 10
10
10
put x 10
Map Core
put y 11
11
11
put y 11
rem y
11
11
rem y
put x 12
12
12
put x 12
rem x
12
12
rem x
2
10
get x
get x
Input Monitor
Output Filter
Min Value
Input/Output Correlator Min 10 Max 12
35When to Use Each Option
- Shut down system when
- It is safe and acceptable
- External intervention is available
- Return error code when
- Client is able to deal with error code
- Return min or max when
- Not safe to shut down system
- No external intervention available
- Client not prepared to deal with error code
Safe Exit
Delegation
Resilient Computing
All options use block box mechanism
36Implementation Approach
Hash Table
a
e
i
AcceptabilityProperty
1
7
11
b
- Each entry has exactly one incoming reference
- From table, table entry, or free list
- Implies no cycles in table or free list
- Implies disjointness of table and free list
3
d
h
4
10
Free List
37Checking for Acceptability Violations
- Auxiliary reference count for each entry
- Traverse data structures to compute counts
- Check that no count greater than one
- Complications
- Invalid pointers (addressing violations)
- Out of bounds array indices
(more addressing violations) - Cycles (infinite traversal loops)
38Mechanisms for Accessing Data Structures
- White Box
- Link monitor and checking code into core
- Possibility of core corrupting checker
(and vice-versa!) - Gray Box
- Checker uses ptrace interface (or mmap)
- More cumbersome to access data structures
- But checker isolated from core
39Inconsistency Responses
- Fail stop halt program, await intervention
- Feasible when halting acceptable
- And intervention practical
- May actually decrease reliability
- Delegation return error code to client
- Feasible when client can deal with error
- Resilient computing fix inconsistency, continue
- Enables continued (acceptable) execution
- Hides effect of inconsistency from clients
40Code for Put Procedure in Map Example
- int tableM
- int freelist
- put(n, v)
- e alloc()
- value(e) v
- strcpy(name(e), n)
- p find(n)
- if (p ! NOENTRY) free(p)
- b bin(n)
- next(e) tableb
- tableb e
- return(v)
- free(e)
- value(e) freelist
- freelist e
Hash table and free list
Allocate and initialize new hash table entry
Free old entry with same name
Insert new entry into hash table
Insert entry into free list
41Code for Put Procedure in Map Example
- int tableM
- int freelist
- put(n, v)
- e alloc()
- value(e) v
- strcpy(name(e), n)
- p find(n)
- if (p ! NOENTRY) free(p)
- b bin(n)
- next(e) tableb
- tableb e
- return(v)
- free(e)
- value(e) freelist
- freelist e
Hash table and free list
Does not check for empty free list
Allocate and initialize new hash table entry
Free old entry with same name
Leaves entry in table
Insert new entry into hash table
Creates cycle if entry already in table
Insert entry into free list
42Problem
- Program crashes if free list empty when call put
New Acceptability Property
Free list is not empty
Acceptability Enforcement
Repair algorithm ensures free list not empty
43Data Structure Repair Goal
All References Valid
Invalid References
Map Core
Map Core
Cycle
No Cycles
Empty Free List
Entries in Free List
44Enforcing Consistency
- Hand-coded consistency algorithm
- Coding is difficult because must assume data
structures can be arbitrarily corrupted - Invalid references, out of bounds indices
- Cycles (can cause infinite loops in repair code)
- Two data structure traversals
- First eliminates invalid references and indices
- Second removes all but first reference to each
entry (requires auxiliary marking data structure) - Reconstruct free list
- Any unreferenced entry put into list
- If free list still empty, steal entry from table
45Issues
- Replace failure with potentially suboptimal (but
still acceptable) execution - Checking overhead
- Depends on properties and application
- Subject to optimization
- Obscured errors
- Record violations and updates in logs
- Use logs to reconstruct actions
- Potential errors in checking and repair code
- Acceptability enforcement code deals with simpler
properties than core - Should be simpler and easier to get correct
46Generalizations
- Process structure consistency
- System structured as collection of processes
- Monitor and regenerate processes to preserve
consistency properties - System configuration consistency
- Difficult to get configuration settings correct
- Monitor and update to satisfy properties
- Properties may depend on running applications,
attached devices, etc. - Both involve structural properties
47Next Problem
- int tableM
- int freelist
- put(n, v)
- e alloc()
- value(e) v
- strcpy(name(e), n)
- p find(n)
- if (p ! NOENTRY) free(p)
- b bin(n)
- next(e) tableb
- tableb e
- return(v)
- free(e)
- value(e) freelist
- freelist e
Buffer Overrun
48Long Inputs Crash Core
Inputs
Outputs
put x 10
10
put y 11
11
rem y
Map Core
11
put xxxxxxxxxxx 12
rem x
get xxxxxxxxxxx
49Long Inputs Crash Core
Inputs
Outputs
put x 10
10
put y 11
11
rem y
Map Core
11
put xxxxxxxxxxx 12
rem x
get xxxxxxxxxxx
50Long Inputs Crash Core
Inputs
Outputs
put x 10
put x 10
10
put y 11
put y 11
11
rem y
rem y
Map Core
11
put xxxxxxxxxxx 12
put xxx 12
12
rem x
rem x
10
get xxxxxxxxxxx
get xxx
12
Truncating Input Filter
51Classification of Techniques
- Acceptability properties can involve
- Inputs, outputs, state, behavior, timing
- In any combination
- Examples
- Use data structures to filter outputs
- Use inputs to repair data structures
- Process structure and configuration consistency
- Timing constraints
- Input arrivals and triggered program actions
- Frequency of output events
52Examples from Real Systems
535ESS Switch
Both systems use hand-coded data structure repair
IBM MVS OS
54- Maintenance Commands
fsck(1M) - NAME
- fsck - check and repair file systems
- SYNOPSIS
- fsck -F FSType -m -V
special ... - fsck -F FSType -n N y Y
-V - -o FSType-specific-options special ...
- DESCRIPTION
- fsck audits and interactively repairs
inconsistent file - system conditions. If the file system is
inconsistent the - default action for each correction is to
wait for the user - to respond yes or no. If the user
does not have write - permission fsck defaults to a no action.
Some corrective - actions will result in loss of data. The
amount and sever- - ity of data loss may be determined from the
diagnostic out-
55GPS Wide Area Augmentation System
56GPS Wide Area Augmentation System
Validates Results
57Ray Tracing Graphics Computations
Scene Composed Of Triangles
58Ray Tracing Graphics Computations
Scene Composed Of Triangles
Shoot Rays Into Scene
59Ray Tracing Graphics Computations
Normal Vectors
Shoot Rays Into Scene Compute How They Interact
with Triangles
60Ray Tracing Graphics Computations
Normal Vectors
Shoot Rays Into Scene
Degenerate Triangle (colinear vertices) Normal
vector computation fails
61Acceptability-Oriented Approach
- Do not code up all degenerate cases
- Failed computation generates a signal
- Catch signal
- Generate some likely value
- Continue with that value
- Result
- Several pixels are incorrect
- But picture as a whole looks fine
- Program simpler and works faster
62Sample Images
T. Kay and J. Kajiya Ray Tracing Complex
Scenes SIGGRAPH 1986
63Limp-Home Modes in Engine Controllers
64Hardware Interlocks
Interlocks Prevent Unsafe entry into enclosure
while the bank is energized or not grounded
Unsafe operation of air-disconnect while vacuum
switches are closed Unsafe operation of ground
switch(s) while air-disconnect is closed
65No Hardware Interlocks
Therac-25
66Common Theme
- Presence of acceptability-oriented features
reduces need for perfection - Safety-critical systems
- Persistent data
- Can have more ambitious core
- More functionality
- More aggressive, riskier algorithms
- Can tolerate algorithms with known errors
- Less development effort
- Less testing and certification
- Can leave infrequent errors in system
67Two Kinds of Acceptability-Oriented Computing
- Opportunistic acceptability-oriented computing
- Observe acceptability problem
- Develop acceptability enforcement mechanism
specifically for that problem - Systematic acceptability-oriented computing
- Identify acceptability properties during
requirements analysis and design - Integrate acceptability features into design
- Implement acceptability enforcement mechanisms as
normal development activity
68Changes to Development Activities
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
69Changes to Development Activities
- Problem With Standard Approaches
- Aspiration of Perfection
- Flat set of requirements
- Specification expected to perfectly capture
requirements - Implementation goal produce flawless
implementation - Testing goal eliminate all implementation errors
- No attempt to
- Focus on important properties
- Build resilient system
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
70With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Prioritize Requirements Separate what really
matters for system From what would be nice
to have Foundation of Acceptability-Oriented
Computing Provides basis for acceptability
properties
71With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Translate Prioritized Requirements into
Acceptability Properties External
Properties Inputs and Outputs
72With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Identify Internal Acceptability Properties Data
Structures Implementation How to Integrate
Acceptability Property Enforcement How to
monitor execution How to intervene Black/gray/wh
ite box
73With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Implement and integrate acceptability enforcement
mechanisms
74With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Acceptability enforcement code helps discover and
localize errors Develop, deploy new
acceptability properties as necessary
75With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Turn on appropriate resilient computing
mechanisms Helps system execute acceptably
with minimal external intervention
76With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
Develop, deploy new acceptability properties as
necessary
77With Acceptability-Oriented Computing
- Requirements
- Specification
- Design
- Implementation
- Testing
- Deployment
- Maintenance
- Potential Adoption Paths
- Adopt incrementally
- Start with specific activity
- Or selected part of system
- Can stop short of complete adoption if it makes
sense - Adopt in parallel
- Small acceptability team
- Most developers oblivious
- Can orient entire development process around
acceptability
78Consequences of Systematic Acceptability-Oriented
Computing
- Better (more acceptable) software
- Improved understanding of requirements
- Inevitable errors placed to minimize harm
- Resilient systems that recover from errors
- Better documentation
- Acceptability properties document what is
important about the system - Acceptability enforcement mechanisms ensure that
they accurately reflect implementation
79More Consequences
- Reduced development and maintenance costs
- Prioritized engineering effort
- More aggressive software reuse
- Reduced testing costs
- Acceptability properties help tester
- Simpler testing for acceptability enforcers
- Can leave infrequent errors in system
80Continued Execution as an Acceptability Property
- Failure-Oblivious Computing
81Why Dont PCs Have Memory With Parity?
Manufacturers Perspective
- With Parity
- Memory error occurs
- PC flags it and stops
- Consumer blames manufacturer
No Parity Memory error occurs PC oblivious, keeps
going If system crashes, consumer blames Microsoft
No Incentive to Increase Parts Cost to Get
Benefit of Parity
82Why Dont PCs Have Memory With Parity?
Consumers Perspective
- With Parity
- Memory error occurs
- PC flags it and stops
- Have to reboot
No Parity Memory error occurs PC oblivious, keeps
going System may never crash (at least,
not because of parity error) If it does crash, no
big surprise
Lack of Parity Increases Reliability!Because It
Makes PC Oblivious to Failure
83Why Will Java Program Fail?
Out of bounds array access
ai x
x ai
Null pointer dereference
o.f x
x o.f
Standard Response Throw an exception and
terminate the program
84Why Will Java Program Fail?
Out of bounds array access
ai x
x ai
Null pointer dereference
o.f x
x o.f
Resilient Computing Response Ignore error
and keep executing
Discard Value
Use Manufactured Value
85Can Extend the Approach to C
- When program attempts to access illegal address
- Discard value (writes)
- Use manufactured value (reads)
- Program keeps executing
- Improved version uses a Safe C compiler
- Catches pointer and array bounds errors
- Replace exception handler to
- Discard value (read)
- Use manufactured (write)
- Program keeps executing
- Improvement reduces data structure damage
86Why Continued Execution is so Valuable
- Systems often consist of
- Multiple components
- Each provides important functionality
- Artificial coupling between components
- Components need flow of control to deliver its
functionality - Any error in any component can deny flow of
control to all others - Continued execution enables control to continue
to flow to each component
87Why Continued Execution is so Valuable
- Furthermore
- Even within a component, error may not cause
unacceptable execution - Or cause of error may eventually be flushed
- Moral of the story
- 90 of life is just showing up
- Keep program showing up
88More Ways to Ensure Continued Execution
- Eliminate special-case code
- Poorly tested, likely to contain errors
- Not as important as common-case code
- Locate code that causes errors and remove it
(garbage collection instance of this idea)
89Complication Infinite Loops
- Failure-oblivious techniques can make a
computation immortal - Need a way to identify, then kill useless or
misguided computations - Bound loop iterations
- Randomize branch and jump targets
- Speculatively parallelize computation
- Lack of good mortality units
- Can attempt to leverage existing structure
threads, transactions, components, - New construct to express mortality units
90Data Structure Consistency Checks
Application-Specific Error Recovery
Data Structure Repair
Failure-Oblivious Computing
Limp Home Modes
Redundant Computing
Code Excision
Development Process Changes
Hardware Interlocks
Input and Output Filtering
Conservative
Aggressive
Acceptability-Oriented Computing is a Perspective
91Key Ideas
- Reject aspiration of perfection
- Focus on acceptability
- Acceptability properties identify acceptability
envelope - Acceptability enforcement mechanisms keep system
within acceptability envelope - Opportunistic vs. systematic approaches
- Ideal result
- More resilient systems
- Less development and testing effort
92Binac Avionics System
93Example Techniques
- Filter out unacceptable inputs
- Truncate strings to eliminate buffer overruns
- Clamp numeric values within range
- Use data structures to filter inputs and outputs
- Use inputs to repair data structures
- Process structure and configuration consistency
- Continued execution as acceptability property
- Failure-oblivious computing
- Code and input variation
94Aesthetics
- AOC about how to get along in world without
perfection - One thing to accept perfection as unattainable
- Another thing to view aspiration for perfection
as counterproductive - Examples from art that informed thinking
- Bach (little fugue in G minor) vs. Mahler 2
- Scale important differentiator
- Picasso (19 year old perfect picture, cubism)
- Michelangelo (david, unfinished)