Title: Statistical Compiler Tuning
1Statistical Compiler Tuning
- M. Haneda
- P.M.W. Knijnenburg
- H.A.G. Wijshoff
2Motivation
- An optimal compiler optimization setting can be
obtained by considering the interaction between
applications, architectures, and compiler
optimizations. - Profiling is a (the best?) way to understand this
interaction. - However a huge number of optimization settings
are possible.
3Example gcc 3.3.1
- 42 options
- defer-pop, force-mem, force-addr,
optimize-sibling-calls, inline-functions,
merge-constants, strength-reduce, thread-jumps,
cse-follow-jumps, cse-skip-blocks,
rerun-cse-after-loop, rerun-loop-opt, gcse,
loop-optimize, crossjumping, if-conversion,
if-conversion2, delete-null-pointer-checks,
expensive-optimizations, optimize-register-move,
schedule-insns, sched-interblock, sched-spec,
schedule-insns2, sched-spec-load,
sched-spec-load-dangerous, caller-saves,
move-all-movables, reduce-all-givs, peephole
peephole2, reorder-blocks, reorder-functions,
strict-aliasing, align-functions, align-labels,
align-loops, align-jumps, cprop-registers,
function-sections, data-sections, unroll-loops
4Example (Contd)
- 42 options, all on/off switches leads to 242
4.4x1012 - Each profile takes 10 runs each taking
approximately 10 sec. Total time per profile 100
sec. - Total amount of time for full profile takes
4.4x1014 sec 7.3x108 weeks 1.4x107 years
5Challenge
- How to find the optimal configuration with
limited amount of profiling?
6Three Approaches
- Statistical approach
- Using Main effect
- Using the Mann-Whitney test
- Random approach
- Approach which focuses on the interaction between
compiler optimizations
7Statistical Approach
- Start with an appropriate initial representation
of the full search space based on the orthogonal
arrays. - Each time after data collection
- Approach1 Compute main effects of compiler
options from the profiling data. - Approach2 Apply Inferential Statistics
(Mann-Whitney Test ) to the profiling data to
detect effective compiler options.
8Orthogonal Arrays
- Orthogonal arrays (OAs) are well chosen
fractional factorial designs. - An OA is expressed as an N x k matrix of 0s and
1s. - The columns are interpreted as factors (compiler
options). - Each row of an array defines a compiler setting.
9Orthogonal Arrays (contd)
- An OA has the property that for any two arbitrary
columns the patterns - 00 01 10 11
- occur equally often.
- According to this property,
- Each compiler option is turned on and off equally
often. - When we drop columns of an OA, the array is still
an orthogonal array.
10Example
- 0 0 0 0 0
- 1 0 0 1 1
- 0 1 0 1 0
- 0 0 1 0 1
- 1 1 0 0 1
- 1 0 1 1 0
- 0 1 1 1 1
- 1 1 1 0 0
O1 O2 O3 O4 O5 Run1 off off
off off off Run2 on off off on on Run3
off on off on off Run4 off off on off
on Run5 on on off off on Run6 on off
on on off Run7 off on on on on Run8 on
on on off off
Interpreted as Compiler Settings
11Inferential Statistics
- Inferential statistics is used to predict whether
a factor of an experiment has a significant
effect in the presence of other factors. - Inferential statistics is based on a null
hypothesis and test statistics. -
12Null Hypothesis
- The null hypothesis denies the effect of a factor
in an experiment - Compiler option A is not effective to
optimize application B. - The Mann-Whitney test is used to compute the test
statistics to evaluate the likelihood of the null
hypothesis.
13Iterative Algorithm
List of compiler options, OA Target application,
Input dataset
Compile application according to the compiler
setting from OA
Profiling data
New option list
Mann-Whitney test
Remove significant options from option list
Significant options
14Iterative Algorithm (Contd)
- Until
- All options are set, or
- No options with a significant effect are detected
anymore, or - The experimental data has not enough variation
(low standard deviation) to apply the
Mann-Whitney test meaningfully.
15Application to GCC
- Compiler version 3.3.1
- Number of options 42 options
- Architecture Pentium 4 at 2.8GHz
- Applications 7 programs from the SPECint 2000
benchmark suite - Measurement Unix time command
- Improvement of configured setting Onew
-
- Obase setting optimization level O with all
options explicitly turned off
16Case study (parser, SPECint2000)
- 1st iteration of the experiment using the
benchmark Parser - We use an OA of order 48, which derives 48
compiler settings - Option 5 is selected.
17Case study (parser, SPECint2000)
- 2nd iteration of the experiment using the
benchmark Parser - Option 3 and 13 are selected.
18Case study (parser, SPECint2000)
- 3rd iteration of the experiment using the
benchmark Parser - Option 4, 19, and 33 are selected.
19Overall Results
20Profiling Time
21Different Architectures
- We apply the Mann-Whitney algorithm two other
architectures, the IA64 dual core Itanium2
1.296GHz and the SUN SPARC dual core 1.28GHz, to
check the robustness of the approach. - We only apply the Mann-Whitney test and 5 out of
the 12 SPECint benchmarks due to compilation
errors.
22Different Architectures
IA64
SPARC
- The performance for the IA64 is comparable to the
performance of O3 - The results for the SPARC are better.
23Different Input Sets (1)
- Improvements using the resulted
- setting and reference input datasets.
- The performance is modest over all
- architectures, but it seems that
- the resulted settings can be
- applicable to different dataset.
-
24Different Input Sets (2)
25Conclusion
- The Mann-Whitney test can be used to achieve a
fully automated method to determine optimal
compiler settings for a single application. - Resulted compiler settings are applicable for
different input dataset, however the results are
better when we use the target datasets. - The same methodology can be applied to the code
size reduction.