Trace Fragment Selection within Methodbased JVMs - PowerPoint PPT Presentation

About This Presentation

Title:

Trace Fragment Selection within Methodbased JVMs

Description:

Trace Fragment Selection within Method-based JVMs. Duane Merrill Kim Hazelwood. VEE 08 ... Method code arrays patched to transfer control to fragments ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 38

Provided by: dua81

Category:

more less

Transcript and Presenter's Notes

Title: Trace Fragment Selection within Methodbased JVMs

1
Trace Fragment Selection within Method-based JVMs

Duane Merrill Kim Hazelwood

VEE 08
2
Overview

Would trace fragment dispatch benefit VMs with
JITs?
Fragment-dispatch as a feedback-directed
optimization
Why?
Improve VM performance via better instruction
layout
Overview
Motivation
New scheme for trace selection
Viability in JikesRVM
Evaluate opportunities for code improvement
Evaluate trace selection overhead

3
Traditional VM Adaptive Code Generation

Phase 3 More Advanced JIT Compilation
Update Class/TOC dispatch tables, perform OSR
Phase 2 JIT Method compilation
Compilation Shape Source Method
Dispatch Shape Corresponding MC Code Array
Machine Code Trace Fragment
Phase 1 Interpreter
Compilation Shape Source Instruction
Dispatch Shape Corresponding MC Instruction(s)
Machine Code Trace Fragment

4
SDT/ DBI/ Embedded VM Adaptive Code Generation

Phase 3 More Advanced JIT Compilation
Update Class/TOC dispatch tables, perform OSR
Phase 2 JIT Method compilation
Compilation Shape Source Method
Dispatch Shape Corresponding MC Code Array
Machine Code Trace Fragment
Phase 1 Interpreter
Compilation Shape Source Instruction
Dispatch Shape Corresponding MC Instruction(s)
Machine Code Trace Fragment

5
Proposed VM Adaptive Code Generation

Phase 3 More Advanced JIT Compilation
Update Class/TOC dispatch tables, perform OSR
Phase 2 JIT Method compilation
Compilation Shape Source Method
Dispatch Shape(s) Corresponding MC Code Array
Machine Code Trace Fragment
Phase 1 Interpreter
Compilation Shape Source Instruction
Dispatch Shape Corresponding MC Instruction(s)
Machine Code Trace Fragment

6
Trace Fragment Dispatch

Trace
A specific sequence of instructions observed at
runtime
Span
Branches
Procedure calls and returns
Potentially arbitrary number of instructions
Trace Fragment
A finite, linear sequence of machine code
instructions
Single-entry, multiple-exit (viz. superblock)
Cached, linked

foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
7
Trace Fragment Dispatch The Good

Location, Location, Location
Inlining-like
Context sensitive
Partial
Spatial locality provides most of achieved
speedup
Simple, low-cost local optimizations
Redundancy elimination
Nimbly adjusts to changing behavior
Efficient
Lots of early-exits? Discard fragment and
re-trace

foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
8
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()

Lacks optimization power
Data flow analysis
Code motion loop optimizations
Code expansion
Tail duplication
Exponential growth (if all paths maintained
indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
9
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()

Lacks optimization power
Data flow analysis
Code motion loop optimizations
Code expansion
Tail duplication
Exponential growth (if all paths maintained
indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
10
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()

Lacks optimization power
Data flow analysis
Code motion loop optimizations
Code expansion
Tail duplication
Exponential growth (if all paths maintained
indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
N
P
E
to A
11
Supplement Method Dispatch with Trace Dispatch

Why?
Improve VM performance via better instruction
layout
Easily-disposable fragments reflect current
program behavior
How?
JIT compiler inserts instrumentation into method
code arrays
Monitor potential hot trace headers
Record control flow
VM runtime assembles patches trace fragments
Blocks scavenged from compiled code arrays
Conditionals adjusted for proper fallthoughs
Method code arrays patched to transfer control to
fragments
New fragments linked to existing fragments

12
Easy Fragment Management

Improved trace selection
JIT to identify trace starting
VM to determine trace stopping locations
Friendly encoding of instructions
Patch spots built-in
Avoid pesky PC-relative jumps (e.g., switch
statements)
Knowledge of language implementation features
Calling conventions
Stack layout
Virtual method dispatch tables

13
Efficient Fragment Management

Mixed-mode scheme
Execution in both method code arrays trace
fragments
Share the same register allocation
Control flows off-trace into method code arrays
Fewer trace fragments
Manageable code expansion
JVM control is already built into yield points
Disposable trace fragments
No need to redo expensive analysis as behavior
changes

14
Our Work Trace Fragment Selection

Develop new trace selection methodology
Leverage JIT global analysis, VM runtime
Implement trace selection in JikesRVM and
evaluate viability
Do recorded traces indicate room for code
improvement?
Do the traces exhibit good characteristics?
Is instrumentation overhead reasonable?

15
Improved Trace Selection Starting Locations
foo()
A
B
C
bar()

Loop Header Locations
Identified by JIT loop analysis
More accurate than target of backward branch
heuristic
Early exit blocks
Allows trace fragments to be layered
Method prologue
Catches recursive execution

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
16
Improved Trace Selection Starting Locations
foo()
A
B
C
bar()

Loop Header Locations
Identified by JIT loop analysis
More accurate than target of backward branch
heuristic
Early exit blocks
Allows trace fragments to be layered
Method prologue
Catches recursive execution

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
17
Improved Trace Selection Starting Locations
foo()

Loop Header Locations
Identified by JIT loop analysis
More accurate than target of backward branch
heuristic
Early exit blocks
Allows trace fragments to be layered
Method prologue
Catches recursive execution

A
B
C
D
A
B
D
to Epilogue
to C
18
Improved Trace Selection Stopping Criteria
foo()
A
B
C

Cycle
Returned to the loop header
Abutted
Arrived at another loop header
Length Limited (unusual)
128 basic blocks encountered
Rejoined (unusual)
Returned to a basic block already in trace
Exited (unusual)
Exited the method without meeting above
conditions. (Identifiable by stack height.)

bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
19
Improved Trace Selection Stopping Criteria
foo()
A
B
C

Cycle
Returned to the loop header
Abutted
Arrived at another loop header
Length Limited (unusual)
128 basic blocks encountered
Rejoined (unusual)
Returned to a basic block already in trace
Exited (unusual)
Exited the method without meeting above
conditions. (Identifiable by stack height.)

bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
20
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

21
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Loop header counters
Paths through blocks

22
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

23
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

24
Improvement Opportunity
A
B
D
E
C
M
N
P
O
25
Improvement Opportunity
A
B
D
E
C
M
N
P
O
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
26
Trace Layouts in Address Space (227_MTRT)
Traces
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
27
Improvement Opportunity
A
B
D
E
C
M
N
P
O
Gap Transition Fallthrough Transition
28
Trace ContinuityDaCapo SpecJVM98 Benchmarks

1/3 traces necessarily fragmented
(inter-procedural)
Most intra-procedural traces non-contiguous

29
Transitions between basic blocks

Appropriate fallthough block 80 of the time
15 misprediction rate for local control flow.
20 of all transitions could benefit from trace
fragment dispatch

30
Trace Characteristics

Cycle and abutted traces make the majority
Few length-limited, rejoined traces
Surprisingly large number of exited traces
Sporadic loops

31
Instrumentation Overhead

(Startup)

One-iteration tests. (40x)
Mixed slowdown results 7.4 (jython), -6.5
(_227_mtrt)
Average startup overhead 1.7

32
Instrumentation Overhead (Steady State)

40-iteration tests. (8x)
Average steady-state overhead 1.7

33
Summary

Envision trace fragment dispatch as a
feedback-directed optimization
Locality optimizations not addressed by JIT
compiler
Adapt to changing behavior without recompilation
More accurate trace selection
Enabled by the co-location with the JIT and VM
runtime
Evaluated opportunity and cost
20 of basic block transitions do not use
sequential fallthough.
25 of taken branches/calls transfer control flow
to locations outside the VM page
Minimal startup and maintenance overhead for
trace selection