Trace Fragment Selection within Methodbased JVMs - PowerPoint PPT Presentation

About This Presentation
Title:

Trace Fragment Selection within Methodbased JVMs

Description:

Trace Fragment Selection within Method-based JVMs. Duane Merrill Kim Hazelwood. VEE 08 ... Method code arrays patched to transfer control to fragments ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 38
Provided by: dua81
Category:

less

Transcript and Presenter's Notes

Title: Trace Fragment Selection within Methodbased JVMs


1
Trace Fragment Selection within Method-based JVMs
  • Duane Merrill Kim Hazelwood

VEE 08
2
Overview
  • Would trace fragment dispatch benefit VMs with
    JITs?
  • Fragment-dispatch as a feedback-directed
    optimization
  • Why?
  • Improve VM performance via better instruction
    layout
  • Overview
  • Motivation
  • New scheme for trace selection
  • Viability in JikesRVM
  • Evaluate opportunities for code improvement
  • Evaluate trace selection overhead

3
Traditional VM Adaptive Code Generation
  • Phase 3 More Advanced JIT Compilation
  • Update Class/TOC dispatch tables, perform OSR
  • Phase 2 JIT Method compilation
  • Compilation Shape Source Method
  • Dispatch Shape Corresponding MC Code Array
  • Machine Code Trace Fragment
  • Phase 1 Interpreter
  • Compilation Shape Source Instruction
  • Dispatch Shape Corresponding MC Instruction(s)
  • Machine Code Trace Fragment

4
SDT/ DBI/ Embedded VM Adaptive Code Generation
  • Phase 3 More Advanced JIT Compilation
  • Update Class/TOC dispatch tables, perform OSR
  • Phase 2 JIT Method compilation
  • Compilation Shape Source Method
  • Dispatch Shape Corresponding MC Code Array
  • Machine Code Trace Fragment
  • Phase 1 Interpreter
  • Compilation Shape Source Instruction
  • Dispatch Shape Corresponding MC Instruction(s)
  • Machine Code Trace Fragment

5
Proposed VM Adaptive Code Generation
  • Phase 3 More Advanced JIT Compilation
  • Update Class/TOC dispatch tables, perform OSR
  • Phase 2 JIT Method compilation
  • Compilation Shape Source Method
  • Dispatch Shape(s) Corresponding MC Code Array
  • Machine Code Trace Fragment
  • Phase 1 Interpreter
  • Compilation Shape Source Instruction
  • Dispatch Shape Corresponding MC Instruction(s)
  • Machine Code Trace Fragment

6
Trace Fragment Dispatch
  • Trace
  • A specific sequence of instructions observed at
    runtime
  • Span
  • Branches
  • Procedure calls and returns
  • Potentially arbitrary number of instructions
  • Trace Fragment
  • A finite, linear sequence of machine code
    instructions
  • Single-entry, multiple-exit (viz. superblock)
  • Cached, linked

foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
7
Trace Fragment Dispatch The Good
  • Location, Location, Location
  • Inlining-like
  • Context sensitive
  • Partial
  • Spatial locality provides most of achieved
    speedup
  • Simple, low-cost local optimizations
  • Redundancy elimination
  • Nimbly adjusts to changing behavior
  • Efficient
  • Lots of early-exits? Discard fragment and
    re-trace

foo()
A
B
C
bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
8
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
  • Lacks optimization power
  • Data flow analysis
  • Code motion loop optimizations
  • Code expansion
  • Tail duplication
  • Exponential growth (if all paths maintained
    indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
9
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
  • Lacks optimization power
  • Data flow analysis
  • Code motion loop optimizations
  • Code expansion
  • Tail duplication
  • Exponential growth (if all paths maintained
    indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
10
Trace Fragment Dispatch The Bad
foo()
A
B
C
bar()
  • Lacks optimization power
  • Data flow analysis
  • Code motion loop optimizations
  • Code expansion
  • Tail duplication
  • Exponential growth (if all paths maintained
    indefinitely)

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
C
D
M
O
P
E
to A
to N
N
P
E
to A
11
Supplement Method Dispatch with Trace Dispatch
  • Why?
  • Improve VM performance via better instruction
    layout
  • Easily-disposable fragments reflect current
    program behavior
  • How?
  • JIT compiler inserts instrumentation into method
    code arrays
  • Monitor potential hot trace headers
  • Record control flow
  • VM runtime assembles patches trace fragments
  • Blocks scavenged from compiled code arrays
  • Conditionals adjusted for proper fallthoughs
  • Method code arrays patched to transfer control to
    fragments
  • New fragments linked to existing fragments

12
Easy Fragment Management
  • Improved trace selection
  • JIT to identify trace starting
  • VM to determine trace stopping locations
  • Friendly encoding of instructions
  • Patch spots built-in
  • Avoid pesky PC-relative jumps (e.g., switch
    statements)
  • Knowledge of language implementation features
  • Calling conventions
  • Stack layout
  • Virtual method dispatch tables

13
Efficient Fragment Management
  • Mixed-mode scheme
  • Execution in both method code arrays trace
    fragments
  • Share the same register allocation
  • Control flows off-trace into method code arrays
  • Fewer trace fragments
  • Manageable code expansion
  • JVM control is already built into yield points
  • Disposable trace fragments
  • No need to redo expensive analysis as behavior
    changes

14
Our Work Trace Fragment Selection
  • Develop new trace selection methodology
  • Leverage JIT global analysis, VM runtime
  • Implement trace selection in JikesRVM and
    evaluate viability
  • Do recorded traces indicate room for code
    improvement?
  • Do the traces exhibit good characteristics?
  • Is instrumentation overhead reasonable?

15
Improved Trace Selection Starting Locations
foo()
A
B
C
bar()
  • Loop Header Locations
  • Identified by JIT loop analysis
  • More accurate than target of backward branch
    heuristic
  • Early exit blocks
  • Allows trace fragments to be layered
  • Method prologue
  • Catches recursive execution

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
16
Improved Trace Selection Starting Locations
foo()
A
B
C
bar()
  • Loop Header Locations
  • Identified by JIT loop analysis
  • More accurate than target of backward branch
    heuristic
  • Early exit blocks
  • Allows trace fragments to be layered
  • Method prologue
  • Catches recursive execution

D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
17
Improved Trace Selection Starting Locations
foo()
  • Loop Header Locations
  • Identified by JIT loop analysis
  • More accurate than target of backward branch
    heuristic
  • Early exit blocks
  • Allows trace fragments to be layered
  • Method prologue
  • Catches recursive execution

A
B
C
D
A
B
D
to Epilogue
to C
18
Improved Trace Selection Stopping Criteria
foo()
A
B
C
  • Cycle
  • Returned to the loop header
  • Abutted
  • Arrived at another loop header
  • Length Limited (unusual)
  • 128 basic blocks encountered
  • Rejoined (unusual)
  • Returned to a basic block already in trace
  • Exited (unusual)
  • Exited the method without meeting above
    conditions. (Identifiable by stack height.)

bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
19
Improved Trace Selection Stopping Criteria
foo()
A
B
C
  • Cycle
  • Returned to the loop header
  • Abutted
  • Arrived at another loop header
  • Length Limited (unusual)
  • 128 basic blocks encountered
  • Rejoined (unusual)
  • Returned to a basic block already in trace
  • Exited (unusual)
  • Exited the method without meeting above
    conditions. (Identifiable by stack height.)

bar()
D
M
N
O
E
P
A
B
D
M
O
P
E
to C
to N
N
P
E
to A
20
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

21
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Loop header counters
Paths through blocks

22
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
TRACE_HEAD_A
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

23
JIT-Inserted Instrumentation
(a) Assembly of original method code-block
(Loop header)
(b) Assembly of code-block to be used for tracing
Low-fidelity Instrumentation
High-fidelity Instrumentation
A
JUMP_BLOCK
B
C
D
TRACE_HEAD_B
TRAMPOLINE_A
TRAMPOLINE_B
A
INSTRUM_A
B
C
D
INSTRUM_B
TRAMPOLINE_A
TRAMPOLINE_B
INSTRUM_C
TRAMPOLINE_C
TRAMPOLINE_D
INSTRUM_D
Loop header counters
Paths through blocks

24
Improvement Opportunity
A
B
D
E
C
M
N
P
O
25
Improvement Opportunity
A
B
D
E
C
M
N
P
O
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
26
Trace Layouts in Address Space (227_MTRT)
Traces
Virtual Address Space (1GB)
5B0480C6 (Low)
9BFE8D1F (High)
27
Improvement Opportunity
A
B
D
E
C
M
N
P
O
Gap Transition Fallthrough Transition
28
Trace ContinuityDaCapo SpecJVM98 Benchmarks
  • 1/3 traces necessarily fragmented
    (inter-procedural)
  • Most intra-procedural traces non-contiguous

29
Transitions between basic blocks
  • Appropriate fallthough block 80 of the time
  • 15 misprediction rate for local control flow.
  • 20 of all transitions could benefit from trace
    fragment dispatch

30
Trace Characteristics
  • Cycle and abutted traces make the majority
  • Few length-limited, rejoined traces
  • Surprisingly large number of exited traces
  • Sporadic loops

31
Instrumentation Overhead

(Startup)
  • One-iteration tests. (40x)
  • Mixed slowdown results 7.4 (jython), -6.5
    (_227_mtrt)
  • Average startup overhead 1.7

32
Instrumentation Overhead (Steady State)
  • 40-iteration tests. (8x)
  • Average steady-state overhead 1.7

33
Summary
  • Envision trace fragment dispatch as a
    feedback-directed optimization
  • Locality optimizations not addressed by JIT
    compiler
  • Adapt to changing behavior without recompilation
  • More accurate trace selection
  • Enabled by the co-location with the JIT and VM
    runtime
  • Evaluated opportunity and cost
  • 20 of basic block transitions do not use
    sequential fallthough.
  • 25 of taken branches/calls transfer control flow
    to locations outside the VM page
  • Minimal startup and maintenance overhead for
    trace selection

34
Questions?
35
Improved Trace Selection Starting Locations
foo()
A
  • Loop Header Locations
  • Identified by JIT loop analysis
  • More accurate than target of backward branch
    heuristic
  • Early exit blocks
  • Allows trace fragments to be layered
  • Method prologue
  • Catches recursive execution

B
C
D
B
C
to D
36
Improved Trace Selection Starting Locations
foo()
A
  • Loop Header Locations
  • Identified by JIT loop analysis
  • More accurate than target of backward branch
    heuristic
  • Early exit blocks
  • Allows trace fragments to be layered
  • Method prologue
  • Catches recursive execution

B
C
D
B
C
to D
D
A
to A
37
Normalized Trace Layouts (227_MTRT)
Traces
Write a Comment
User Comments (0)
About PowerShow.com