Title: Trace Fragment Selection within Methodbased JVMs
1Trace Fragment Selection within Method-based JVMs
- Duane Merrill Kim Hazelwood
VEE 08
2Overview
- Would trace fragment dispatch benefit VMs with
JITs? - Fragment-dispatch as a feedback-directed
optimization - Why?
- Improve VM performance via better instruction
layout - Overview
- Motivation
- New scheme for trace selection
- Viability in JikesRVM
- Evaluate opportunities for code improvement
- Evaluate trace selection overhead
3Traditional VM Adaptive Code Generation
- Phase 3 More Advanced JIT Compilation
- Update Class/TOC dispatch tables, perform OSR
- Phase 2 JIT Method compilation
- Compilation Shape Source Method
- Dispatch Shape Corresponding MC Code Array
- Machine Code Trace Fragment
- Phase 1 Interpreter
- Compilation Shape Source Instruction
- Dispatch Shape Corresponding MC Instruction(s)
- Machine Code Trace Fragment
4SDT/ DBI/ Embedded VM Adaptive Code Generation
- Phase 3 More Advanced JIT Compilation
- Update Class/TOC dispatch tables, perform OSR
- Phase 2 JIT Method compilation
- Compilation Shape Source Method
- Dispatch Shape Corresponding MC Code Array
- Machine Code Trace Fragment
- Phase 1 Interpreter
- Compilation Shape Source Instruction
- Dispatch Shape Corresponding MC Instruction(s)
- Machine Code Trace Fragment
5Proposed VM Adaptive Code Generation
- Phase 3 More Advanced JIT Compilation
- Update Class/TOC dispatch tables, perform OSR
- Phase 2 JIT Method compilation
- Compilation Shape Source Method
- Dispatch Shape(s) Corresponding MC Code Array
- Machine Code Trace Fragment
- Phase 1 Interpreter
- Compilation Shape Source Instruction
- Dispatch Shape Corresponding MC Instruction(s)
- Machine Code Trace Fragment
6Trace Fragment Dispatch
- Trace
- A specific sequence of instructions observed at
runtime - Span
- Branches
- Procedure calls and returns
- Potentially arbitrary number of instructions
- Trace Fragment
- A finite, linear sequence of machine code
instructions - Single-entry, multiple-exit (viz. superblock)
- Cached, linked
foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
7Trace Fragment Dispatch The Good
- Location, Location, Location
- Inlining-like
- Context sensitive
- Partial
- Spatial locality provides most of achieved
speedup - Simple, low-cost local optimizations
- Redundancy elimination
- Nimbly adjusts to changing behavior
- Efficient
- Lots of early-exits? Discard fragment and
re-trace
foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
8Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
- Lacks optimization power
- Data flow analysis
- Code motion loop optimizations
- Code expansion
- Tail duplication
- Exponential growth (if all paths maintained
indefinitely)
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
9Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
- Lacks optimization power
- Data flow analysis
- Code motion loop optimizations
- Code expansion
- Tail duplication
- Exponential growth (if all paths maintained
indefinitely)
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
10Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
- Lacks optimization power
- Data flow analysis
- Code motion loop optimizations
- Code expansion
- Tail duplication
- Exponential growth (if all paths maintained
indefinitely)
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
N
P
E
to A
11Supplement Method Dispatch with Trace Dispatch
- Why?
- Improve VM performance via better instruction
layout - Easily-disposable fragments reflect current
program behavior - How?
- JIT compiler inserts instrumentation into method
code arrays - Monitor potential hot trace headers
- Record control flow
- VM runtime assembles patches trace fragments
- Blocks scavenged from compiled code arrays
- Conditionals adjusted for proper fallthoughs
- Method code arrays patched to transfer control to
fragments - New fragments linked to existing fragments
12Easy Fragment Management
- Improved trace selection
- JIT to identify trace starting
- VM to determine trace stopping locations
- Friendly encoding of instructions
- Patch spots built-in
- Avoid pesky PC-relative jumps (e.g., switch
statements) - Knowledge of language implementation features
- Calling conventions
- Stack layout
- Virtual method dispatch tables
13Efficient Fragment Management
- Mixed-mode scheme
- Execution in both method code arrays trace
fragments - Share the same register allocation
- Control flows off-trace into method code arrays
- Fewer trace fragments
- Manageable code expansion
- JVM control is already built into yield points
- Disposable trace fragments
- No need to redo expensive analysis as behavior
changes
14Our Work Trace Fragment Selection
- Develop new trace selection methodology
- Leverage JIT global analysis, VM runtime
- Implement trace selection in JikesRVM and
evaluate viability - Do recorded traces indicate room for code
improvement? - Do the traces exhibit good characteristics?
- Is instrumentation overhead reasonable?
15Improved Trace Selection Starting Locations
foo()
A
B
C
bar()
- Loop Header Locations
- Identified by JIT loop analysis
- More accurate than target of backward branch
heuristic - Early exit blocks
- Allows trace fragments to be layered
- Method prologue
- Catches recursive execution
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
16Improved Trace Selection Starting Locations
foo()
A
B
C
bar()
- Loop Header Locations
- Identified by JIT loop analysis
- More accurate than target of backward branch
heuristic - Early exit blocks
- Allows trace fragments to be layered
- Method prologue
- Catches recursive execution
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
17Improved Trace Selection Starting Locations
foo()
- Loop Header Locations
- Identified by JIT loop analysis
- More accurate than target of backward branch
heuristic - Early exit blocks
- Allows trace fragments to be layered
- Method prologue
- Catches recursive execution
A
B
C
D
A
B
D
to Epilogue
to C
18Improved Trace Selection Stopping Criteria
foo()
A
B
C
- Cycle
- Returned to the loop header
- Abutted
- Arrived at another loop header
- Length Limited (unusual)
- 128 basic blocks encountered
- Rejoined (unusual)
- Returned to a basic block already in trace
- Exited (unusual)
- Exited the method without meeting above
conditions. (Identifiable by stack height.)
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
19Improved Trace Selection Stopping Criteria
foo()
A
B
C
- Cycle
- Returned to the loop header
- Abutted
- Arrived at another loop header
- Length Limited (unusual)
- 128 basic blocks encountered
- Rejoined (unusual)
- Returned to a basic block already in trace
- Exited (unusual)
- Exited the method without meeting above
conditions. (Identifiable by stack height.)
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
20JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks
21JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Loop header counters
Paths through blocks
22JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks
23JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks
24Improvement Opportunity
A
B
D
E
C
M
N
P
O
25Improvement Opportunity
A
B
D
E
C
M
N
P
O
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
26Trace Layouts in Address Space (227_MTRT)
Traces
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
27Improvement Opportunity
A
B
D
E
C
M
N
P
O
Gap Transition Fallthrough Transition
28Trace ContinuityDaCapo SpecJVM98 Benchmarks
- 1/3 traces necessarily fragmented
(inter-procedural) - Most intra-procedural traces non-contiguous
29Transitions between basic blocks
- Appropriate fallthough block 80 of the time
- 15 misprediction rate for local control flow.
- 20 of all transitions could benefit from trace
fragment dispatch
30Trace Characteristics
- Cycle and abutted traces make the majority
- Few length-limited, rejoined traces
- Surprisingly large number of exited traces
- Sporadic loops
31Instrumentation Overhead
(Startup)
- One-iteration tests. (40x)
- Mixed slowdown results 7.4 (jython), -6.5
(_227_mtrt) - Average startup overhead 1.7
32Instrumentation Overhead (Steady State)
- 40-iteration tests. (8x)
- Average steady-state overhead 1.7
33Summary
- Envision trace fragment dispatch as a
feedback-directed optimization - Locality optimizations not addressed by JIT
compiler - Adapt to changing behavior without recompilation
- More accurate trace selection
- Enabled by the co-location with the JIT and VM
runtime - Evaluated opportunity and cost
- 20 of basic block transitions do not use
sequential fallthough. - 25 of taken branches/calls transfer control flow
to locations outside the VM page - Minimal startup and maintenance overhead for
trace selection
34Questions?
35Improved Trace Selection Starting Locations
foo()
A
- Loop Header Locations
- Identified by JIT loop analysis
- More accurate than target of backward branch
heuristic - Early exit blocks
- Allows trace fragments to be layered
- Method prologue
- Catches recursive execution
B
C
D
B
C
to D
36Improved Trace Selection Starting Locations
foo()
A
- Loop Header Locations
- Identified by JIT loop analysis
- More accurate than target of backward branch
heuristic - Early exit blocks
- Allows trace fragments to be layered
- Method prologue
- Catches recursive execution
B
C
D
B
C
to D
D
A
to A
37Normalized Trace Layouts (227_MTRT)
Traces