Automated Derivation of ApplicationAware Error Detectors - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Automated Derivation of ApplicationAware Error Detectors

Description:

... Fanout, ... Fanout: 80% coverage with 10 ideal detectors. Lifetimes: 90% coverage using ... Fanout metric used to choose detector points (using static ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 42

Provided by: pk71

Category:

more less

Transcript and Presenter's Notes

Title: Automated Derivation of ApplicationAware Error Detectors

1
Automated Derivation of Application-Aware Error
Detectors

Karthik Pattabiraman
Joint work with
G.P Sagesse, N. Nakka, D. Chen, W. Healey, W. Gu,
Z. Kalbarczyk and Ravi K. Iyer

2
Fault Tolerance Myths and Realities

Error Detection is easy or unnecessary
Error Detection is a hard problem
Even the best recovery methods are useless
without efficient, low-latency error detection
Programs follow Crash-Failure semantics
Error propagation causes hard-to-recover
failures
Duplication is easy and cost-efficient
High hardware and performance cost
Hard to avoid correlated failures

3
Importance of Error Detection
Rm 0.9 0.989 0.999 0.999 0.972 0.978 0.9
78
Rm 0.7 0.908 0.988 0.996 0.868 0.918 0.9
21
Rm 0.5 0.748 0.931 0.990 0.700 0.812 0.8
33
C0.99, n2 C0.99, n4 C0.99, ninf C
0.8 , n2 C 0.8 , n4 C0.8, ninf
With low error-detection coverage, reliability
saturates
4
Crash Latency Distributions for (Linux on
Pentium P4 and PowerPC G4)

Measurements study of Linux kernel shows
Significant crash latency
Billion CPU cycles between the time of executing
corrupted instruction or accessing bad data and
the system crash
Application aware checking can reduce the latency

5
Drawbacks of Duplication

IBM G5 approach
Replicated pipelines in lock-step (only 30 of
processor)
Correlated errors possible in shared state
Tandem Non-Stop Himalaya
Voting on every clock cycle at pins of processor
Modern processors too complex to support this
Vote on I/O or memory operation instead (Crash?)
Many faults detected by duplication do not
manifest as application-visible errors
Application knowledge needed to detect errors
that matter
Can degrade overall availability if every error
is detected

6
Application-Aware Error Detection
7
Goals

Embed error detectors in code based on
application-specific properties
Preemptively detect errors at runtime and prevent
error propagation resulting in corrupted states
Automatically derive detectors from application
code and execution
Provide efficient hardware/software support for
implementing the error detectors
Extend error detectors for security checking

8
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points to form a minimum symbolic
expression
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
9
Fault Models

Errors in application data
Data value is corrupted at the time of its
definition (when it is written to or computed)
Hardware Errors represented
Incorrect computation (Not detected by ECC !)
Soft errors in memory, registers and cache
Errors in Instruction issue/decode
Software Errors represented
Uninitialized values or incorrectly initialized
values
Memory corruption, dangling pointers
Integer overflows, values out-of-bounds
Timing errors and race conditions

10
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points to form a minimum symbolic
expression
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
11
Where to Place the Detectors?

Must Identify variable(s) to check and location
to place the detectors
Starting Point program's Dynamic Dependence
Graph
Metrics to choose candidate points for placement
e.g., Fanout, Lifetime, Execution
Employed Fault injection experiments to assess
the coverage of the selected
Experiments verify that it is sufficient to check
a single variable at a single detection point
A single detector in the code provides 60
coverage for a large application like GCC!

12
Dynamic Dependence Graph
Assume loop executes 5 times

Maps onto nodes 11, 16, 21, 26, 28.
Used in 3 instructions
BNE R1, R2, LOOP (same iter)
LW R3, R1 A (next iter)
ADDI R1, R1, 1 (next iter)

ADDI R1, R1, 1
13
Coverage Results (Multiple Detectors)

Fanout 80 coverage with 10 ideal detectors
Lifetimes 90 coverage using 25 ideal
detectors)
Placing detectors Randomly on hot-paths Need
100 ideal detectors to achieve 90 coverage

14
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points to form a minimum symbolic
expression
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
15
Deriving Detectors Static Code Analysis

Identify and store the instructions that compute
the target variable at the detector location
Encode the instructions as a symbolic expression
Reduce the expression to shorten instruction
sequence (to avoid simple duplication)
Only encode the variables that affect the value
of the chosen variable at the detector location
(program slicing)
Create specialized versions of the computation
slice depending on the path that is followed at
runtime (partial evaluation)
Instrument code to track paths followed at
runtime
Choose check depending on path followed at runtime

16
Static Analysis Example
if (path1)
if (a 0)
else
then
then
else
f2 a e if (a ! 0)
f2 2 c e if (a0)
b a c d b e f d b
c a d b d e f b c
then
then
else
else
path1
path2
if (f2f)
use f
then
else
Declare Error in f along path and exit
Rest of code
17
Path Slicing

Perform backward traversal through the Static
Dependence Graph (SDG) starting from the detector
location
Upon a fork in the Control Flow Graph (CFG),
create two paths and continue expansion
Stop expanding paths upon encountering
An instruction that has been visited before
Beginning or end of function
Function calls, returns, free instructions,
system calls

18
Path-Tracking

Paths
Entry, A, C, D, F
Entry, B, C, E, F
Entry, A, C, E, F
Entry, B, C, D, F

Entry
B
A
1
2
3
4
Basic-Block
C
2 2 2 2
Entry
2 1 2 1
B
E
D
2 1 2 1
C
2 0 1 1
E
F
2 0 2 1
F
19
Performance Results
Software Hardware
Software-Only
By using programmable hardware and cache to track
paths at runtime, significantly reduces the
instrumentation overhead (from 50 to 5) and
hence, the overall performance overhead (from 65
to 20 )
20
Implementation Details

Implemented using an Optimizing compiler
LLVM (Developed in Vikram Adves group at
Illinois)
Technique implemented as an LLVM Pass
Handles recursive calls, dynamic memory
allocations, system calls etc.
Fanout metric used to choose detector points
(using static analysis and profiling)
Tested on sample C programs
E.g., Fibonacci, Bubble sort, Fast-Fourier
Transform

21
Example Matrix Multiplication

void rInnerproduct(float result, float
arowsize1rowsize1, float browsize1rowsiz
e1, int row, int column)
/ computes the inner product of Arow,
and B,column /
int i
result 0.0f
for (i 1 iltrowsize i) result
resultarowibicolumn
void Mm (int run)
int i, j
Initrand()
rInitmatrix (rma)
rInitmatrix (rmb)
for ( i 1 i lt rowsize i )
for ( j 1 j lt rowsize j )
rInnerproduct(rmrij,r
ma,rmb,i,j)
printf("f\n", rmrrun 1run 1)

22
Example LLVM intermediate code

void rInnerproduct(double result, 41 x
double a, 41 x double b, int row, int
column)
loopentry
.
br bool tmp.2, label no_exit, label loopexit
no_exit
tmp.7 load 41 x double a_addr
tmp.8 load int row_addr
tmp.9 getelementptr 41 x double
tmp.7, int tmp.8
tmp.10 load int i
tmp.11 getelementptr 41 x double
tmp.9, int 0, int tmp.10
tmp.12 load double tmp.11
tmp.13 load 41 x double b_addr
tmp.14 load int i
tmp.15 getelementptr 41 x double
tmp.13, int tmp.14
tmp.16 load int column_addr
tmp.17 getelementptr 41 x double
tmp.15, int 0, int tmp.16
tmp.18 load double tmp.17
tmp.19 mul double tmp.12, tmp.18

23
Checking code added to example

tmp.20.i add double tmp.12.tmp.2, tmp.19.i
switch uint pathValue-8114, label rest-8
uint 2, label path2-8
uint 3, label path3-8
uint 4, label path4-8
path2-8 preds rest-9
new.2.tmp.19.i mul double tmp.12.i, tmp.18.i
new.2.tmp.20.i add double 0.000000e00,
new.2.tmp.19.i
br label Check-8
path3-8 preds rest-9
new.3.tmp.19.i mul double tmp.12.i,
tmp.18.i
new.3.tmp.20.i add double
tmp.20.i.copy, new.3.tmp.19.i
br label Check-8
path4-8 preds rest-9
new.4.tmp.19.i mul double tmp.12.i,
tmp.18.i

24
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points and form a symbolic
expression to encode runtime paths
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
25
Approach
26
What is a Detector ?

A check based on the value of a program variable
or memory location at a program point
Only detectors based on the value of a single
variable/location or Single-Valued Detectors
Involves only current and previous values of
variable
Detector consists of
A generic rule (belonging to a template class)
An exception condition for values of the variable
that do not satisfy the rule (logical expression)

27
Dynamic Detector Example

void foo( int N )
for (int k 0 kltN k)
"Either the current value of k is zero, or it is
greater than the previous value of k by 1"
(ki ki-1 1) or (ki 0)
Rule Exception

28
Detector Classes
29
Deriving Detectors Dynamic Analysis

Detector Tightness
Probability that a detector detects an erroneous
value of the variable it checks
Conceptually different from coverage
Execution Cost
Amortized additional computation involved in
invoking the detector over multiple values
observed at the detector point
Choose detector with highest (Tightness / Cost)
ratio for each detector point
First, choose rule from template classes for data
stream
Next, form the exception condition to account for
values that do not satisfy the rule.
If no exception can be found, discard the rule
and try again

30
Experimental Setup

Steps in Evaluation
Analysis Detector placement and code
instrumentation
Training Learning detectors using representative
inputs
Testing Fault-injection experiments by flipping
random bits in application data
Tool used for evaluation Modified version of
Simplescalar simulator (functional simulation)
Emulates real-world behavior under faults
Application Workload Siemens suite
C programs with 100-1000 lines of code

31
Coverage versus Number of Detectors
32
Coverage Versus Detector Type
33
False-Positives

Error detected even when no fault is injected
Less than 6 for all applications except tot_info

34
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points and form a symbolic
expression to encode runtime paths
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
35
Hardware Implementation

RSE is a reconfigurable processor-level framework
for reliability and security
Detectors implemented as an RSE module consisting
of
Shadow Register File - holds the state of the
checked location
Assertion Table - stores the assertions
parameters
Data-path - check assertions independently from
processor

36
Hardware Synthesis Results
Area Overhead of EDMs alone 30 Area Overhead
of EDM RSE interface 45 Performance
Overhead 5.6
37
Approach
Determine where (program location and variable)
to place detectors for best coverage
Placement
Dynamic Analysis
Static Analysis
Instrument application to observe values at
detector points and form assertions based on
these values
Perform backward slicing on application code from
the detector points and form a symbolic
expression to encode runtime paths
Reliability Security
Check assertions using a combination of software
and hardware
Runtime
38
Information-Flow Signatures

Use detection of program data-flow violations as
an indicator of malicious tampering with the
system
Prevent an attacker from exploiting disconnect
between source-level semantics and execution
semantics of program
Employ a compile-time static program analysis to
extract instructions allowed (at runtime) to
write to a given memory location
Sign each identified location by the PC(s) of
instruction(s) allowed to write to this location
Typically, only a few static instructions write
to a given program location
Employ special hardware to perform runtime check

39
What and How Do We Check ?

Security Critical Data (incomplete list)
System call arguments,
Function Call and Return addresses
Control-flow data
Pointers on the stack and the heap
Special hardware maintains a tag for each memory
word
Write to a location
Create runtime signature corresponding to the
location
(PC) XOR (tag)
Reference to a location
Check the tag against the set of allowed
signatures (derived at compile-time)
If there are no matches the operation is
disallowed

40
Summary

Application-aware Error Detection and Recovery to
ensure low-latency error detection
Technique to place detectors in code to achieve
upto 80 coverage with 10 detectors
Dynamic analysis to derive value-based error
detectors and implement in hardware
Static analysis to derive checking expression
based on backward program slicing
Efficient implementation in hardware with
significant benefits over full-duplication

41
Ongoing and Future Work

Dynamic Analysis Extension to larger programs
and multi-valued detectors
Static Analysis Concise representation of
checking expressions and compiling to H/W
Extension to Security Signatures based on
Information-flow in a program
Formal methods of verification of derived
detectors Model Checking/Thm Proving

Write a Comment

User Comments (0)