Just-In-Time Java Compilation for the Itanium Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Just-In-Time Java Compilation for the Itanium Processor

Description:

Just-In-Time Java Compilation for the Itanium Processor Tatiana Shpeisman Guei-Yuan Lueh Ali-Reza Adl-Tabatabai Intel Labs Introduction Itanium processor is ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 23
Provided by: tshp5
Category:

less

Transcript and Presenter's Notes

Title: Just-In-Time Java Compilation for the Itanium Processor


1
Just-In-Time Java Compilation for the Itanium
Processor
  • Tatiana Shpeisman
  • Guei-Yuan Lueh
  • Ali-Reza Adl-Tabatabai
  • Intel Labs

2
Introduction
  • Itanium processor is statically scheduled machine
  • Aggressive compiler techniques to extract ILP
  • Just-In-Time (JIT) compiler must be fast
  • Must consider time space efficiency of
    optimizations
  • Balance compilation time with code quality
  • Light-weight compilation techniques
  • Use heuristics for modeling micro architecture
  • Leverage semantics and meta data of JVM

3
Outline
  • Introduction
  • Compiler overview
  • Register allocation
  • Code scheduling
  • Other optimizations
  • Conclusions

4
Compiler Structure
Code Selection
Prepass
Register Allocation
IR construction
Predication
Code Scheduling
Inlining
GC Support
Global optimizations
Code Emission
Back-end
Front-end
5
Register Allocation
  • Compilation time vs. code quality tradeoff
  • IPF architecture has large register files
  • 128 integer, 128 floating-point, 64 predicate, 8
    branch
  • Register Stack Engine (RSE) provides 96 stack
    registers to each procedure
  • Use linear scan register allocation
  • Linear Scan Register Allocation by Massimiliano
    Poletto and Vivek Sarkar

6
Live Range vs. Live Interval
Live Ranges
Live Intervals
7
Coalescing Algorithm
  • Coalesce v and t in v t iff
  • Live interval of t ends at v t
  • Live interval of t does not intersect with live
    range of v
  • Requires one additional reverse pass over IR
  • O(NINST NVAR NBB)

8
Coalescing Speedup
9
Code Scheduling
  • Forward cycle-based list scheduling
  • Scheduling unit is extended basic block
  • Middle exits are due to run-time exceptions

(p6,p7) cmp.eq r35, 0 (p6) br
ThrowNullPointerException r10 r35 16
r11 ld8 r10
10
Type-based memory disambiguation
  • Use JVM meta data to disambiguate memory
    locations
  • Type
  • Integer, floating-point, object reference
  • Kind
  • Object field, array element, virtual table
    address
  • Field id
  • putfield 10 vs. putfield 15

11
Type-Based Disambiguation
12
Exception Dependencies
  • Java exceptions are precise
  • Naive approach
  • Exception checks end basic blocks
  • Our approach
  • Instruction depends on exception check iff
  • Its destination is live at the exception handler,
    or
  • It is an exception check for different exception
    type
  • It is a memory reference that may be guarded by
    check

13
Exception Dependency Example
14
Exception Dependencies
15
IPF Architecture
  • Execution (functional) unit type M, I, F, B
  • Instruction (syllable type) M, A, I, F, B, IL
  • Bundles, templates
  • .mii .mii .mil .mmi .mmi .mfi .mmf .mib .mbb
    .bbb .mmb .mfb
  • Instruction group no WAR, WAW with some
    exceptions

.mii r10 ld r15 r9 add r8, 1 // stop
bit r16 shr r9, r32
16
Template Selection
  • Pack instructions into bundles
  • Choose slot for each instruction
  • Insert NOP instructions
  • Assign instructions to functional units
  • Problem
  • Resource over subscription
  • Inaccurate bypass latencies

17
Algorithm
  • Greedy slot assignment
  • Sort instruction by syllable type
  • M lt F lt IL lt I lt A lt B

I1 r20 sxt r14 (I-type) I2 r21
movl ADDR (IL-type) I3 f15 fadd f10, f11
(F-type)
18
Template Selection Heuristics
19
Bypass Latency Accuracy
  • Phase ordering of functional unit assignment
  • Code selection time is too early underutilizes
    resources
  • Template selection time too late inaccurate
    scheduling latencies
  • Solution Assign to functional unit during
    scheduling
  • Assign to M-Unit if available, else
  • Assign to I-Unit and increment latency

20
Modeling of Address Computation Latency
21
Other optimizations
  • Predication
  • Profitability depends on a benchmark
  • Performance variations within 2
  • Branch hints
  • Up to 50 speedup from using branch hints
  • Sign-extension elimination
  • 1 potential gain for our compiler

22
Conclusions
  • Light-weight optimizations techniques for Itanium
  • Considering micro architecture is important
  • Cannot ignore bypass latencies
  • Template selection should be resource sensitive
  • Language semantics helps to improve ILP
  • Type-based memory disambiguation
  • Exception dependency elimination
Write a Comment
User Comments (0)
About PowerShow.com