Title: Partial Method Compilation using Dynamic Profile Information
1Partial Method Compilationusing Dynamic Profile
Information
John Whaley Stanford University October 17, 2001
2Outline
- Background and Overview
- Dynamic Compilation System
- Partial Method Compilation Technique
- Optimizations
- Experimental Results
- Related Work
- Conclusion
3Dynamic Compilation
- We want code performance comparable to static
compilation techniques - However, we want to avoid long startup delays and
slow responsiveness - Dynamic compiler should be fast AND good
4Traditional approach
- Interpreter plus optimizing compiler
- Switch from interpreter to optimizing compiler
via some heuristic - Problems
- Interpreter is too slow! (10x to 100x)
5Another approach
- Simple compiler plus optimizing compiler
(Jalapeno, JUDO, Microsoft) - Switch from simple to optimizing compiler via
some heuristic - Problems
- Code from simple compiler is still too slow!
(30 to 100 slower than optimizing) - Memory footprint problems (Suganuma et al.,
OOPSLA01)
6Yet another approach
- Multi-level compilation (Jalapeno, HotSpot)
- Use multiple compiled versions to slowly
accelerate into optimized execution - Problems
- This simply increases the delay before the
program runs at full speed!
7Problem with compilation
- Compilation takes time proportional to the amount
of code being compiled - Many optimizations are superlinear in the size of
the code - Compilation of large amounts of code is the cause
of undesirably long compilation times
8Methods can be large
- All of these techniques operate at method
boundaries - Methods can be large, especially after inlining
- Cutting inlining too much hurts performance
considerably (Arnold et al., Dynamo00) - Even when being frugal about inlining, methods
can still become very large
9Methods are poor boundaries
- Method boundaries do not correspond very well to
the code that would most benefit from
optimization - Even hot methods typically contain some code
that is rarely or never executed
10Example SpecJVM db
- void read_db(String fn)
- int n 0, act 0 byte buffer null
- try
- FileInputStream sif new FileInputStream(fn)
- buffer new byten
- while ((b sif.read(buffer, act, n-act))gt0)
- act act b
-
- sif.close()
- if (act ! n)
- / lots of error handling code, rare /
-
- catch (IOException ioe)
- / lots of error handling code, rare /
-
Hot loop
11Example SpecJVM db
- void read_db(String fn)
- int n 0, act 0 byte buffer null
- try
- FileInputStream sif new FileInputStream(fn)
- buffer new byten
- while ((b sif.read(buffer, act, n-act))gt0)
- act act b
-
- sif.close()
- if (act ! n)
- / lots of error handling code, rare /
-
- catch (IOException ioe)
- / lots of error handling code, rare /
-
Lots of rare code!
12Hot regions, not methods
- The regions that are important to compile have
nothing to do with the method boundaries - Using a method granularity causes the compiler to
waste time optimizing large pieces of code that
do not matter
13Overview of our technique
- Increase the precision of selective
- compilation to operate at a sub-method
- granularity
- Collect basic block level profile data for hot
methods - Recompile using the profile data, replacing rare
code entry points with branches into the
interpreter
14Overview of our technique
- Takes advantage of the well-known fact that a
large amount of code is rarely or never executed - Simple to understand and implement, yet highly
effective - Beneficial secondary effect of improving
optimization opportunities on the common paths
15Overview of Dynamic Compilation System
16interpreted code
Stage 1
when execution count t1
compiled code
Stage 2
when execution count t2
fully optimized code
Stage 3
17Identifying rare code
- Simple technique any basic block executed during
Stage 2 is said to be hot - Effectively ignores initialization
- Add instrumentation to the targets of conditional
forward branches - Better techniques exist, but using this we saw no
performance degradation - Enable/disable profiling is implicitly handled by
stage transitions
18Method-at-a-time strategy
of basic blocks
execution threshold
19Actual basic blocks executed
of basic blocks
execution threshold
20Partial method compilation technique
21Technique
- Based on profile data, determine the set of rare
blocks. - Use code coverage information from the first
compiled version
22Technique
- Perform live variable analysis.
- Determine the set of live variables at rare block
entry points
live x,y,z
23Technique
- Redirect the control flow edges that targeted
rare blocks, and remove the rare blocks.
to interpreter
24Technique
- Perform compilation normally.
- Analyses treat the interpreter transfer point as
an unanalyzable method call.
25Technique
- Record a map for each interpreter transfer point.
- In code generation, generate a map that specifies
the location, in registers or memory, of each of
the live variables. - Maps are typically lt 100 bytes
live x,y,z
x sp - 4
y R1
z sp - 8
26Optimizations
27Partial dead code elimination
- Modified dead code elimination to treat rare
blocks specially - Move computation that is only live on a rare path
into the rare block, saving computation in the
common case
28Partial dead code elimination
- Optimistic approach on SSA form
- Mark all instructions that compute essential
values, recursively - Eliminate all non-essential instructions
29Partial dead code elimination
- Calculate necessary code, ignoring all rare
blocks - For each rare block, calculate the instructions
that are necessary for that rare block, but not
necessary in non-rare blocks - If these instructions are recomputable at the
point of the rare block, they can be safely
copied there
30Partial dead code example
- x 0
- if (rare branch 1)
- ...
- z x y
- ...
-
- if (rare branch 2)
- ...
- a x z
- ...
-
31Partial dead code example
- if (rare branch 1)
- x 0
- ...
- z x y
- ...
-
- if (rare branch 2)
- x 0
- ...
- a x z
- ...
-
32Pointer and escape analysis
- Treating an entrance to the rare path as a method
call is a conservative assumption - Typically does not matter because there are no
merges back into the common path - However, this conservativeness hurts pointer and
escape analysis because a single unanalyzed call
kills all information
33Pointer and escape analysis
- Stack allocate objects that dont escape in the
common blocks - Eliminate synchronization on objects that dont
escape the common blocks - If a branch to a rare block is taken
- Copy stack-allocated objects to the heap and
update pointers - Reapply eliminated synchronizations
34Copying from stack to heap
Heap
copy
stack object
stack object
rewrite
35Reconstructing interpreter state
- We use a runtime glue routine
- Construct a set of interpreter stack frames,
initialized with their corresponding method and
bytecode pointers - Iterate through each location pair in the map,
and copy the value at the location to its
corresponding position in the interpreter stack
frame - Branch into the interpreter, and continue
execution
36Experimental Results
37Experimental Methodology
- Fully implemented in a proprietary system
- Unfortunately, cannot publish those numbers!
- Proof-of-concept implementation in thejoeq
virtual machine http//joeq.sourceforge.net - Unfortunately, joeq does not perform significant
optimizations!
38Experimental Methodology
- Also implemented as an offline step, using
refactored class files - Use offline profile information to split methods
into hot and cold parts - We then rely on the virtual machines default
method-at-a-time strategy - Provides a reasonable approximation of the
effectiveness of this technique - Can also be used as a standalone optimizer
- Available under LGPL as part of joeq release
39Experimental Methodology
- IBM JDK 1.3 cx130-20010626 on RedHat Linux 7.1
- Pentium 3 600 mhz, 512 MB RAM
- Thresholds t1 2000, t2 25000
- Benchmarks SpecJVM, SwingSet, Linpack, JavaLex,
JavaCup
40Run time improvement
First bar original Second bar PMC Third bar
PMC my opts
Blue optimized execution
41Related Work
- Dynamic techniques
- Dynamo (Bala et al., PLDI00)
- Self (Chambers et al., OOPSLA91)
- HotSpot (JVM01)
- IBM JDK (Ishizaki et al., OOPSLA00)
42Related Work
- Static techniques
- Trace scheduling (Fisher, 1981)
- Superblock scheduling (IMPACT compiler)
- Partial redundancy elimination with cost-benefit
analysis (Horspool, 1997) - Optimal compilation unit shapes (Bruening,
FDDO00) - Profile-guided code placement strategies
43Conclusion
- Partial method compilation technique is simple to
implement, yet very effective - Compile times reduced drastically
- Overall run times improved by an average of 10,
and up to 32 - System is available under LGPL at
http//joeq.sourceforge.net