Title: Just-In-Time Java Compilation for the Itanium Processor
1Just-In-Time Java Compilation for the Itanium
Processor
- Tatiana Shpeisman
- Guei-Yuan Lueh
- Ali-Reza Adl-Tabatabai
- Intel Labs
2Introduction
- Itanium processor is statically scheduled machine
- Aggressive compiler techniques to extract ILP
- Just-In-Time (JIT) compiler must be fast
- Must consider time space efficiency of
optimizations - Balance compilation time with code quality
- Light-weight compilation techniques
- Use heuristics for modeling micro architecture
- Leverage semantics and meta data of JVM
3Outline
- Introduction
- Compiler overview
- Register allocation
- Code scheduling
- Other optimizations
- Conclusions
4Compiler Structure
Code Selection
Prepass
Register Allocation
IR construction
Predication
Code Scheduling
Inlining
GC Support
Global optimizations
Code Emission
Back-end
Front-end
5Register Allocation
- Compilation time vs. code quality tradeoff
- IPF architecture has large register files
- 128 integer, 128 floating-point, 64 predicate, 8
branch - Register Stack Engine (RSE) provides 96 stack
registers to each procedure - Use linear scan register allocation
- Linear Scan Register Allocation by Massimiliano
Poletto and Vivek Sarkar
6Live Range vs. Live Interval
Live Ranges
Live Intervals
7Coalescing Algorithm
- Coalesce v and t in v t iff
- Live interval of t ends at v t
- Live interval of t does not intersect with live
range of v - Requires one additional reverse pass over IR
- O(NINST NVAR NBB)
8Coalescing Speedup
9Code Scheduling
- Forward cycle-based list scheduling
- Scheduling unit is extended basic block
- Middle exits are due to run-time exceptions
-
(p6,p7) cmp.eq r35, 0 (p6) br
ThrowNullPointerException r10 r35 16
r11 ld8 r10
10Type-based memory disambiguation
- Use JVM meta data to disambiguate memory
locations - Type
- Integer, floating-point, object reference
- Kind
- Object field, array element, virtual table
address - Field id
- putfield 10 vs. putfield 15
11Type-Based Disambiguation
12Exception Dependencies
- Java exceptions are precise
- Naive approach
- Exception checks end basic blocks
- Our approach
- Instruction depends on exception check iff
- Its destination is live at the exception handler,
or - It is an exception check for different exception
type - It is a memory reference that may be guarded by
check
13Exception Dependency Example
14Exception Dependencies
15IPF Architecture
- Execution (functional) unit type M, I, F, B
- Instruction (syllable type) M, A, I, F, B, IL
- Bundles, templates
- .mii .mii .mil .mmi .mmi .mfi .mmf .mib .mbb
.bbb .mmb .mfb - Instruction group no WAR, WAW with some
exceptions
.mii r10 ld r15 r9 add r8, 1 // stop
bit r16 shr r9, r32
16Template Selection
- Pack instructions into bundles
- Choose slot for each instruction
- Insert NOP instructions
- Assign instructions to functional units
- Problem
- Resource over subscription
- Inaccurate bypass latencies
17Algorithm
- Greedy slot assignment
- Sort instruction by syllable type
- M lt F lt IL lt I lt A lt B
I1 r20 sxt r14 (I-type) I2 r21
movl ADDR (IL-type) I3 f15 fadd f10, f11
(F-type)
18Template Selection Heuristics
19Bypass Latency Accuracy
- Phase ordering of functional unit assignment
- Code selection time is too early underutilizes
resources - Template selection time too late inaccurate
scheduling latencies - Solution Assign to functional unit during
scheduling - Assign to M-Unit if available, else
- Assign to I-Unit and increment latency
20Modeling of Address Computation Latency
21Other optimizations
- Predication
- Profitability depends on a benchmark
- Performance variations within 2
- Branch hints
- Up to 50 speedup from using branch hints
- Sign-extension elimination
- 1 potential gain for our compiler
22Conclusions
- Light-weight optimizations techniques for Itanium
- Considering micro architecture is important
- Cannot ignore bypass latencies
- Template selection should be resource sensitive
- Language semantics helps to improve ILP
- Type-based memory disambiguation
- Exception dependency elimination