Title: Translation Validation for an Optimizing Compiler
1Translation Validation for an Optimizing Compiler
Based on George C. Necula article (ACM SIGPLAN
2000)
Advanced Programming Languages Seminar, Winter
2000
2In a Nutshell
- The Problem Verify that the optimized and source
code are equivalent - Partial (heuristic) Solution Independently prove
the validity of each translation pass - Motivation Optimizer Testing
3Outline
- Introduction
- Intermediate Language
- An extensive example
- Simulation Relation
- Execution Pair
- Equivalence Checking
- Branch Navigation
- Results and Limitations
4Methods of Proving Compiler Correctness
- Prove compiler general correctness
- absolute
- tedious
- impractical for large programs
- very dependent of compiler code
5Methods of Proving Compiler Corr. (cont.)
- Show that each translation phase was valid
- weaker
- proof per program
- applicable for large programs
- independent of compiler code
6Compilation Process
SourceCode
IntermediateLanguage(IL)
TargetCode
7Optimization Process
Optimize Pass
ILCode0
ILCode1
ILCoden
Validator
8The IL in GNU C (subset)
- InstructionsExpressions
- Operators
9An Example
extern int gextern int amain() int n
/ n contains the length of the array / int
i for (i0 iltn i) aigi3 return
i
10And in IL
for (i0iltn i) aigi3return i
11After Transformation
Use registers
Transform while to a repeat loop
?ltgt
?ltgt
12Equivalence
- x1,,xn variables in source
- y1,,ym variables in target
- Variable Equivalencex1 y3
- Expression Equivalencex1x2 y36
13Simulation Relation
- A set of equivalences between a source block and
a target block
14Execution Pair
- Definition An execution path in the source and
its corresponding path in the target
Source
Target
15Checking Equivalence
- Equivalence is checked at the end of a specific
execution pair - A variable value after the run is marked with a
prime
Symbolic Substitution
xx1
x
y
yy3
16Equivalence Simplification
- An equivalence can be simplified using
- Arithmetic rules
- Already proven equivalences
- Example If xx1 and yy5 then3xy?3(x1
)y5?3x3y5 - An equivalence holds if it can be simplified to
an already proven equivalence
17Checking Simulation Relations
- A relation is correct if for each execution pair
entering it, all of its equivalences hold
x
y
xy1
18Something fishy
- Whats the point of proving something using the
same rules that created it?
- Simpler
- Provides an independent perspective on the final
code
19Showtime
C. Prove elem. 2 (Trivial)
20Element 5
21Element 5 (cont.)
22Known Equivalences
- Equivalences from the start of the run
- Equivalences at the end of run
23Need to Prove
- The path condition is correct
- The equivalences hold, mainly
24Elem 5 Path Cond.
25Elem 5 The Equivalence
Q.E.D
26Algorithm Parts
- Inferring Simulation Relations
- Finding execution pairs
- Solving Constraints
27Navigating Branches
- An optimizer might eliminate or reverse branches
- Problem did branch B originate from branch B in
the source - Solution Use heuristics
28A Typical Case
29Similarity
- The similarity between two branches depend on the
similarity of their - preceding instruction sequence
- boolean conditions
- the two branching sequences
30Similarity (cont.)
- is a numeric relation(0..1)
- and is multiplication
- or is maximum
31Boolean Similarity
- Branches are similar if
- one can be simplified into the other using simple
transforms, such as
32Instruction Similarity
- Instructions similarity
- amount of function calls
- lead to already related branches (in that case,
similarity is 1.0)
33Instruction Similarity
- gcc specific features
- IL instructions serial number
- source line number information (for code
duplication detection)
34Results
- Detected a known bug in gcc 2.7.2.2
- Used on large programs
- Increased compile time x4
35Limitations
- Cannot handle loop unrolling
- Cannot resolve all types of equivalences
- Produces several false alarms (i.e. the gcc bug
was accompanied by 3 false alarms)
36Conclusion
- Automatically infer equivalences
- Uses
- simple rules and substitution
- heuristics
- Good results
- Problems
- false alarms
- runtime overhead