Enhancing Software Reliability with Speculative Threads - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Enhancing Software Reliability with Speculative Threads

Description:

What can we as architects and system designers do to help? ... Especially if hardware makes the task easier and more ... Hardware Support: Speculative Threads ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 64
Provided by: constantin63
Category:

less

Transcript and Presenter's Notes

Title: Enhancing Software Reliability with Speculative Threads


1
Enhancing Software Reliability withSpeculative
Threads
  • Jeffrey Oplinger and Monica Lam
  • Stanford University

2
Motivation
  • Reliability, availability and serviceability
    (RAS) are dominant issues in computing
  • Security holes are costly!
  • programmer finding and fixing vulnerabilities
  • user applying update after update
  • everyone aftermath of a security compromise
  • What can we as architects and system designers do
    to help?
  • first, a look at the current approaches

3
Current Techniques to Address RAS
  • Static analysis
  • Formal verification doesnt really work
  • Tools PREFIX, LCLINT,
  • useful, but unsound/incomplete
  • Runtime schemes overheads!
  • Safer languages (Java) significantly slower
  • Purify 2x to 5x slowdown
  • bounds-checking-gcc gt 10x slowdown
  • Programmer discipline bugs are inevitable!

4
State of Computer Architecture
  • What to do with all these transistors?
  • Use them to increase performance?
  • A perennial target
  • Marginal returns are decreasing
  • RAS is increasingly important
  • Instead, provide new features to software
  • Make safer code easier to write
  • Speed up expensive but useful runtime schemes

5
Proposal Monitor-and-Recover _at_ Runtime
  • Monitor the program execution at runtime
  • Verify that execution was correct
  • Recover from detected errors if possible
  • Future programs will hopefully have more
    application-level checking and verification
  • Especially if hardware makes the task easier and
    more efficient!

6
Outline
  • Motivation Architecture Support for RAS
  • Monitoring Code Error Recovery
  • Current Schemes
  • Proposed Programming Paradigms
  • Hardware Support Speculative Threads
  • Experimental Evaluations
  • Monitoring Code
  • Recovery with Fine-grain Transactions
  • Conclusions

7
Execution Monitoring at Runtime
  • Examples
  • Performance monitoring (Pixie)
  • Detecting memory misuse (Purify)
  • Run-time anomaly detection (DIDUCE)
  • Too expensive for shipped code
  • Even painful during the development cycle
  • Viewed as not essential even if useful
  • PROPOSAL make monitoring more efficient
  • efficiency/performance ? more use/functionality

8
Monitor-and-Recover Paradigm
  • Monitoring code
  • Inserted into the original program and obeys
    sequential semantics
  • Typically does not affect the main computation
  • Perhaps a predictable ok value returned
  • For performance, execute in parallel with rest of
    normal program
  • Re-execute if any data dependences violated
  • Precise exception semantics

9
Error Detection at Runtime
  • StackGuard
  • Instruments the program to detect stack
    corruption
  • Libsafe
  • Replace unsafe string routines in C library
  • Catches corruption before or after it happens
  • Corruption detection how to recover?
  • kill the process ? denial-of-service opportunity

10
Error Recovery at Runtime
  • Manual recovery
  • Need to know exactly what to fix
  • Hard to write cleanup code
  • Easy to get wrong or incomplete
  • Automatic recovery (transactions, logging)
  • Often too expensive
  • Coarse granularity often not appropriate
  • PROPOSAL fine-grained transactions

11
Monitor-and-Recover Paradigm
  • Fine-grain recoverable transactions
  • Software marks the beginning of a transaction
  • All further side-effects (memory and register)
    are buffered
  • Software decides when to either commit or abort
    the transaction
  • Allows for robust end-to-end error detection and
    recovery

12
Recovery Programming Model Example
ltinput string parsing codegt
13
Recovery Programming Model Example
ltinput string parsing codegt if
(StackGuardError()) exit(-1)
14
Recovery Programming Model Example
try ltinput string parsing codegt if
(StackGuardError()) ABORT COMMIT
catch log_error() skip_to_next_input()
15
Recovery Programming Model Example
try ltinput string parsing codegt if
(StackGuardError()) ABORT COMMIT
catch log_error() skip_to_next_input()
16
Recovery Programming Model Example
try ltinput string parsing codegt if
(StackGuardError()) ABORT COMMIT
catch log_error() skip_to_next_input()
X X X
17
Recovery Programming Model Example
try ltinput string parsing codegt if
(StackGuardError()) ABORT COMMIT
catch log_error() skip_to_next_input()
18
Outline
  • Motivation Architecture Support for RAS
  • Monitoring Code Error Recovery
  • Current Schemes
  • Proposed Programming Paradigms
  • Hardware Support Speculative Threads
  • Experimental Evaluations
  • Monitoring Code
  • Recovery with Fine-grain Transactions
  • Conclusions

19
Hardware Support Speculative Threads
  • Thread-level Speculation (TLS) originally
    designed to speed up uniprocessor integer
    programs
  • Break the computation into (relatively)
    independent threads execute in parallel
  • Buffer side effects and detect data dependence
    violations discard andre-execute if needed

20
Procedural Thread-level Speculation (TLS)
NORMALSEQUENTIAL EXECUTION
. . .
21
Procedural Thread-level Speculation (TLS)
CALL
B
NORMALSEQUENTIAL EXECUTION
RET
CALL
C
RET
. . .
22
Procedural Thread-level Speculation (TLS)
A1
CALL
B
NORMALSEQUENTIAL EXECUTION
RET
A2
CALL
C
RET
A3
. . .
23
Procedural Thread-level Speculation (TLS)
A1
A1
CALL
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
24
Procedural Thread-level Speculation (TLS)
A1
A1
CALL
CALL
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
25
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
26
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
27
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
28
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
RET
EXECUTE
A2
CALL
C
RET
A3
. . .
29
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
RET
EXECUTE
CALL
A2
CALL
C
RET
A3
. . .
30
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
RET
EXECUTE
fork
CALL
A2
C
A3
CALL
RET
C
. . .
RET
A3
. . .
31
Procedural Thread-level Speculation (TLS)
A1
A1
fork
CALL
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
RET
EXECUTE
fork
CALL
A2
C
A3
CALL
RET
C
. . .
RET
A3
. . .
32
Procedural Thread-level Speculation (TLS)
need data dependence checking
A1
A1
fork
CALL
B
B
A2
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
fork
A2
C
A3
CALL
C
. . .
RET
A3
. . .
33
Procedural Thread-level Speculation (TLS)
observeddatadependence
A1
fork
CALL
ST
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
LD
fork
A2
CALL
C
. . .
RET
A3
. . .
34
Procedural Thread-level Speculation (TLS)
unobserveddatadependence
A1
fork
CALL
LD
B
NORMALSEQUENTIAL EXECUTION
ST
RET
EXECUTE
fork
A2
CALL
C
. . .
RET
A3
. . .
35
Procedural Thread-level Speculation (TLS)
unobserveddatadependence
A1
fork
x
CALL
LD
B
x
NORMALSEQUENTIAL EXECUTION
ST
RET
EXECUTE
fork
x
A2
CALL
C
. . .
RET
A3
. . .
36
Procedural Thread-level Speculation (TLS)
unobserveddatadependence
A1
fork
x
CALL
LD
B
x
NORMALSEQUENTIAL EXECUTION
ST
RET
EXECUTE
fork
x
x
A2
x
CALL
C
. . .
RET
A3
. . .
37
Procedural Thread-level Speculation (TLS)
unobserveddatadependence
A1
A1
fork
x
CALL
B
x
B
NORMALSEQUENTIAL EXECUTION
RET
EXECUTE
fork
x
x
A2
x
CALL
C
. . .
A2
re-execute
RET
fork
A3
C
A3
. . .
38
Using TLS to speed up Monitoring
A1
A1
A1
fork
INSERT INSTRUMEN-TATION
M1
M1
A2
A2
EXECUTE
fork
A2
A3
M2
A3
M2
. . .
. . .
A3
. . .
39
Using TLS to speed up Monitoring
A1
A1
A1
fork
INSERT INSTRUMEN-TATION
M1
M1
A2
A2
EXECUTE
fork
A2
A3
M2
A3
M2
. . .
hopefully significant parallelism
between monitoring andoriginal code
. . .
A3
. . .
40
Using TLS to speed up Heavy Monitoring
fork
fork
M1
M1
fork
M2
INSERT INSTRUMEN-TATION
M3
M4
. . .
M2
EXECUTE
. . .
M3
here, need independencebetween monitoringcode
invocationsto get decent speedup
M4
. . .
41
Using TLS to Support Transactions
  • Speculative buffers must hold all memory
    side-effects
  • Memory hazard detection not needed
  • Start speculative execution at TRY
  • Initial register state is saved
  • Thread restart is changed to ABORT
  • jumps to CATCH instead of re-executing
  • Thread control exposed to software via COMMIT and
    ABORT primitives

42
Machine Architectures for Thread-level Speculation
  • Variety of proposals
  • Speculative buffering in cache or load-store
    queues
  • Selective recovery or restart whole thread
  • Our machine
  • Fine-grained threads ?
  • Simultaneous Multithreading (SMT) based
  • Use load-store queues to buffer the state
  • Trace buffers expensive ? No selective recovery
  • Procedural speculation ? Return value prediction

43
Machine Architecture Base Superscalar
PC
FETCH
FUs
D
INST QUEUE
DECODE
RENAME
44
Machine Architecture SMT support
FETCH
PC
FUs
D
INST QUEUE
DECODE
RENAME
ADDED MODIFIED
45
Machine Architecture TLS support
FETCH
PC
FUs
D
INST QUEUE
DECODE
RENAME
ADDED MODIFIED
46
Machine Architecture TLS performance
FETCH
PC
FUs
D
INST QUEUE
DECODE
RENAME
ADDED MODIFIED
47
Machine Architecture
FETCH
PC
FUs
D
INST QUEUE
DECODE
RENAME
48
Outline
  • Motivation Architecture Support for RAS
  • Monitoring Code Error Recovery
  • Current Schemes
  • Proposed Programming Paradigms
  • Hardware Support Speculative Threads
  • Experimental Evaluations
  • Monitoring Code
  • Recovery with Fine-grain Transactions
  • Conclusions

49
Experimental Evaluation Monitoring Code
  • Pixie counts basic-block executions
  • Third Degree memory checker (like PURIFY)
  • DIDUCE anomaly detection tool
  • Originally for Java tracks values reports when
    anomalies are detected
  • Can watch load/stores, parameters, return vals
  • Our version instruments loads in the binary
  • DIDUCE instruments all static loads
  • DIDUCE.1 instruments only 10 of static loads

50
Simulated Machine
  • Common across all experiments
  • 5-stage pipeline
  • Configurable number of thread contexts
  • Maximum of 2 threads fetch per cycle
  • 32k 4-way L1D, 32k 2-way L1I, 512k 4-way L2
  • Based on SimpleScalar simulator

51
Simulation Parameters
  • Sample Configurations
  • SMT1/t1 is a 4-wide processor with one thread (no
    TLS)
  • SMT4/t1 is a 16-wide processor with one thread
    (no TLS)
  • only exploits additional ILP
  • SMT4/t8 is a 16-wide processor with 8 TLS
    threads

52
Simulation Thread Control Operations
  • Thread fork
  • Initiated in DECODE pipeline stage
  • Single-cycle flash copy of starting register
    state
  • New thread begins FETCH in the following cycle
  • Thread meet
  • Before meet, finishing thread must
  • issue all buffered stores to memory
  • finalize outstanding register writes for
    validation
  • Only one fork or meet allowed each cycle

53
Simulated Programs
  • Four different instrumentations
  • Pixie
  • Third
  • DIDUCE.1
  • DIDUCE
  • Two different SPEC95 base programs are
    instrumented
  • (V) Vortex
  • (P) Perl

54
Runtime Overhead of Instrumentation
55
Runtime Overhead of Instrumentation
56
Runtime Overhead of Instrumentation
57
Effective IPC
ILP
ILPTLS
58
Relative Performance Improvement
ILP
ILPTLS
TLS
59
Outline
  • Motivation Architecture Support for RAS
  • Monitoring Code Error Recovery
  • Current Schemes
  • Proposed Programming Paradigms
  • Hardware Support Speculative Threads
  • Experimental Evaluations
  • Monitoring Code
  • Recovery with Fine-grain Transactions
  • Conclusions

60
Using TLS to Support Recovery
  • Speculative buffers must hold all memory
    side-effects
  • Need significant buffering ? use L1 cache
  • e.g. to hold buffer overrun attacks
  • possibilities when even larger?
  • buffer further in the memory hierarchy (L2, L3)
  • commit by default
  • abort by default
  • fall back to coarse (OS) support

61
Evaluation Transactions with Recovery
  • Examined three networked programs, wrapped
    routines with buffer-overflow vulnerabilities
    into transactions
  • bftpd, imapd unsafe use of C string library
    functions, used Libsafe-like error detection
  • ntpd bug in handwritten string parsing, used
    StackGuard-like error detection
  • Stack traversal is optimized here (unoptimized in
    the paper)

62
Transaction Results
63
Conclusions
  • Performance is still an issue for monitoring
  • Better performance means more utility
  • 1.6x speedup from TLS, 2.4x with ILP as well
  • e.g. 2.5x overhead became 12 overhead
  • 5.3 IPC overall
  • Need to provide more ways for programmers to get
    their code right!
  • Fine-grained transactions allow easier checks
    with precise and complete recovery
  • More research needed!
Write a Comment
User Comments (0)
About PowerShow.com