Title: Compiler Optimization-Space Exploration
1Compiler Optimization-SpaceExploration
Authors Spyridon Triantafyllis, Manish
Vachharajani, Neil Vachharajani, David I. August
- Adrian Pop
- IDA/PELAB
- adrpo_at_ida.liu.se
2Outline
- Introduction
- The Problem
- Predictive Heuristics and A Priori Evaluation
- Some Solutions
- Iterative Compilation and A Posteriori Evaluation
- Our Solution
- Optimization-Space Exploration
- Evaluation
- Conclusion
3Introduction
- Processors
- become more complex
- incorporate additional computational resources
- Consequence
- Compilers
- become more complex
- use aggressive optimizations
- have to use predictive heuristics in order to
decide where and to what extend optimizations
should be applied
4The Problem Predictive Heuristics
- Predictive Heuristics
- tries to determine a priori the benefits of
certain optimization - are tuned to give the highest average performance
- The Result
- significant performance gains are unrealized!
-
5Some Solutions Iterative Compilation
- Iterative Compilation
- optimize the programs in many ways
- choose a posteriori the best code version
- Pitfall of current schemes
- prohibitive compilation times!
- limitation to specific architectures
- embedded systems
- limited to specific optimizations
6Our solution Optimization-Space Exploration
- OSE Compiler (Practical Iterative Compilation)
- explores the space of optimization configurations
through multiple compilations - it uses the experience of the compiler writer to
prune the number of configurations that should be
explored - uses a performance estimator to not evaluate the
code by execution - selects a custom configuration for each code
segment - selects next optimization configuration by
examining the previous configurations
characteristics
7OSE over many conigurations
8OSE Limiting the Search Space
- Optimization Space
- derived from a set of optimization parameters
- Optimization Parameters
- Optimization level
- High Level Optimization (HLO) level
- Micro-architecture type
- Coalesce adjacent loads and stores
- HLO phase order
- Loop unroll limit
- Update dependencies after unrolling
- Perform software pipelining
9OSE Limiting the Search Space
- Optimization Parameters
- Heuristic to disable software pipelining
- Allow control speculation during software
pipelining - Software pipeline outer loops
- Enable if-conversion heuristic for software
pipelining - Software pipeline loops with early exists
- Enable if conversion
- Enable non-standard predication
- Enable pre-scheduling
- Scheduler ready criterion
10OSE Limiting the Search Space
- Compiler Construction-time Pruning
- limit the total number of configurations that
will be considered at compile time - construct a set S with at most N configurations
- S is chosen by determining the impact on a
representative set of code segments C as follows - S default configuration configurations with
non-default parameters - a) run C compiled with S on real hardware and
retain in S only the valuable configurations - b) consider the combination of configurations in
S as S repeat a) for S and retain only the
best N configurations - repeat b) until no new configurations can be
generated or the speedup does not improve
11OSE Limiting the Search Space
- Characterizing Configuration Correlations
- build a optimization configuration tree
- critical configurations conf. at the same level
- 1. Construct O set of m most important
- configurations in S for all
- code segments in C
- 2. Choose all oi in O as the successor of the
root node. - 3. For each configurations oi in O
- 4. Construct Ci cj argmax(pj,k) i k1m
- 5. Repeat steps 3, 4 to find oi successors
limiting - the code segments to Ci and configurations to
S\O.
12OSE Limiting the Search Space
- Compile-time search
- do a breadth first search on the optimization
configuration tree - choose the configuration that yields the best
estimated performance
13OSE Limiting the Search Space
- Limit the OSE application
- to hot code segments
- hot code segments are identified through
profiling or hardware performance counters during
a program run
14Evaluation
- OSE Compiler Algorithm
- 1. Profile the code
- 2. For each Function
- 3. Compile to the high level IR
- 4. Optimize using HLO
- 5. For each Function
- 6. If the function is hot
- 7. Perform OSE on second HLO and CG
- 8. Emit the function using the best
- configuration
- 9. If the function is not hot use the
- standard configuration
15Compile-time Performance Estimation
- Model Based on
- Ideal Cycle Count T
- Data cache performance, Lambda, L
- Instruction cache performance, I
- Branch misprediction, B
16Results
17Conclusions