Just-In-Time Java Compilation for the Itanium Processor - PowerPoint PPT Presentation

About This Presentation

Title:

Just-In-Time Java Compilation for the Itanium Processor

Description:

Just-In-Time Java Compilation for the Itanium Processor Tatiana Shpeisman Guei-Yuan Lueh Ali-Reza Adl-Tabatabai Intel Labs Introduction Itanium processor is ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 23

Provided by: tshp5

Learn more at: https://arcb.csc.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Just-In-Time Java Compilation for the Itanium Processor

1
Just-In-Time Java Compilation for the Itanium
Processor

Tatiana Shpeisman
Guei-Yuan Lueh
Ali-Reza Adl-Tabatabai
Intel Labs

2
Introduction

Itanium processor is statically scheduled machine
Aggressive compiler techniques to extract ILP
Just-In-Time (JIT) compiler must be fast
Must consider time space efficiency of
optimizations
Balance compilation time with code quality
Light-weight compilation techniques
Use heuristics for modeling micro architecture
Leverage semantics and meta data of JVM

3
Outline

Introduction
Compiler overview
Register allocation
Code scheduling
Other optimizations
Conclusions

4
Compiler Structure
Code Selection
Prepass
Register Allocation
IR construction
Predication
Code Scheduling
Inlining
GC Support
Global optimizations
Code Emission
Back-end
Front-end
5
Register Allocation

Compilation time vs. code quality tradeoff
IPF architecture has large register files
128 integer, 128 floating-point, 64 predicate, 8
branch
Register Stack Engine (RSE) provides 96 stack
registers to each procedure
Use linear scan register allocation
Linear Scan Register Allocation by Massimiliano
Poletto and Vivek Sarkar

6
Live Range vs. Live Interval
Live Ranges
Live Intervals
7
Coalescing Algorithm

Coalesce v and t in v t iff
Live interval of t ends at v t
Live interval of t does not intersect with live
range of v
Requires one additional reverse pass over IR
O(NINST NVAR NBB)

8
Coalescing Speedup
9
Code Scheduling

Forward cycle-based list scheduling
Scheduling unit is extended basic block
Middle exits are due to run-time exceptions

(p6,p7) cmp.eq r35, 0 (p6) br
ThrowNullPointerException r10 r35 16
r11 ld8 r10
10
Type-based memory disambiguation

Use JVM meta data to disambiguate memory
locations
Type
Integer, floating-point, object reference
Kind
Object field, array element, virtual table
address
Field id
putfield 10 vs. putfield 15

11
Type-Based Disambiguation
12
Exception Dependencies

Java exceptions are precise
Naive approach
Exception checks end basic blocks
Our approach
Instruction depends on exception check iff
Its destination is live at the exception handler,
or
It is an exception check for different exception
type
It is a memory reference that may be guarded by
check

13
Exception Dependency Example
14
Exception Dependencies
15
IPF Architecture

Execution (functional) unit type M, I, F, B
Instruction (syllable type) M, A, I, F, B, IL
Bundles, templates
.mii .mii .mil .mmi .mmi .mfi .mmf .mib .mbb
.bbb .mmb .mfb
Instruction group no WAR, WAW with some
exceptions

.mii r10 ld r15 r9 add r8, 1 // stop
bit r16 shr r9, r32
16
Template Selection

Pack instructions into bundles
Choose slot for each instruction
Insert NOP instructions
Assign instructions to functional units
Problem
Resource over subscription
Inaccurate bypass latencies

17
Algorithm

Greedy slot assignment
Sort instruction by syllable type
M lt F lt IL lt I lt A lt B

I1 r20 sxt r14 (I-type) I2 r21
movl ADDR (IL-type) I3 f15 fadd f10, f11
(F-type)
18
Template Selection Heuristics
19
Bypass Latency Accuracy

Phase ordering of functional unit assignment
Code selection time is too early underutilizes
resources
Template selection time too late inaccurate
scheduling latencies
Solution Assign to functional unit during
scheduling
Assign to M-Unit if available, else
Assign to I-Unit and increment latency

20
Modeling of Address Computation Latency
21
Other optimizations

Predication
Profitability depends on a benchmark
Performance variations within 2
Branch hints
Up to 50 speedup from using branch hints
Sign-extension elimination
1 potential gain for our compiler

22
Conclusions

Light-weight optimizations techniques for Itanium
Considering micro architecture is important
Cannot ignore bypass latencies
Template selection should be resource sensitive
Language semantics helps to improve ILP
Type-based memory disambiguation
Exception dependency elimination

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Lecture No 10 Extending the Processor PowerPoint PPT Presentation

Lecture No 10 Extending the Processor - Lecture No 10 Extending the Processor s Power to Other Devices Microcomputer Processors Creating Computer Programs Extending the Processor s Power to Other ... | PowerPoint PPT presentation | free to view

Itanium 2 Profiling Tools: Performance monitoring events Pfmon (Open Source) Intel Vtune Analyzer PowerPoint PPT Presentation

Itanium 2 Profiling Tools: Performance monitoring events Pfmon (Open Source) Intel Vtune Analyzer - Itanium 2 Profiling Tools: Performance monitoring events Pfmon (Open Source) Intel Vtune Analyzer Arthur Raefsky raefsky@sgi.com Overview Profiling Tools: The Intel ... | PowerPoint PPT presentation | free to view

Introducing The IA64 Architecture PowerPoint PPT Presentation

Introducing The IA64 Architecture - Intel's Solution: EPIC (Explicitly Parallel Instruction Computing) ... M. F. Guest - 'Intel's Itanium IA-64 Processor: Overview and Initial Experience' ... | PowerPoint PPT presentation | free to view

Yeni Ekonominin Dinamosu PowerPoint PPT Presentation

Yeni Ekonominin Dinamosu - hp zx1 chipset's unique value-add drives the fastest Intel Itanium 2 processor ... hp's zx1 chipset enables 2 4 processor-capable systems starting with the Intel ... | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 15 PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 15 - ... Xeon, Itanium, AMD Opteron or even classic UltraSPARC processors. ... What programming styles investigated? How big multiprocessor? How measure quality? ... | PowerPoint PPT presentation | free to view

Allen D. Malony PowerPoint PPT Presentation

Allen D. Malony - threads of execution. multi-level parallelism ... execution models (Java threads, MPI) Java Virtual ... user-level threads, light-weight virtual processors ... | PowerPoint PPT presentation | free to view

Class Overview PowerPoint PPT Presentation

Class Overview - Advanced Idiom Recognition. The Pentium 4 processor provides ... The Intel compiler aggressively detects such idioms during intra-register vectorization. ... | PowerPoint PPT presentation | free to view

Computer Organization Lecture 1 Course Introduction and the Five Components of a Computer Modified F PowerPoint PPT Presentation

Computer Organization Lecture 1 Course Introduction and the Five Components of a Computer Modified F - ... reconfigurable, Special considerations for low power/mobile processing ... JAVA, . . . The Net = ubiquitous computing. Lec 1.18 ... | PowerPoint PPT presentation | free to view

Performance Instrumentation and Measurement for Terascale Systems PowerPoint PPT Presentation

Performance Instrumentation and Measurement for Terascale Systems - Make collection of run-time performance data easy by: ... PAPI (http://icl.cs.utk.edu/papi/) OPARI (http://www.fz-juelich.de/zam ... | PowerPoint PPT presentation | free to view

Managing Bounded Code Caches in Dynamic Binary Optimizers PowerPoint PPT Presentation

Managing Bounded Code Caches in Dynamic Binary Optimizers - All create a modified code image at run time. Good ... For good performance, vast majority of code should execute in the code cache ... Code Cache Design Space ... | PowerPoint PPT presentation | free to view

NonStop Roadmap Update PowerPoint PPT Presentation

NonStop Roadmap Update - Now is the time to improve the efficiency of the IT system itself ... Safeguard and premium Spoolers included (same as H-series) Easy to adopt ... | PowerPoint PPT presentation | free to view

Introduction to Clusters PowerPoint PPT Presentation

Introduction to Clusters - Follow-on lectures talk more in detail about various aspects of clustering ... (SHRIMP) Scalable High-performance Really Inexpensive Multi-Processor (Princeton) ... | PowerPoint PPT presentation | free to view

hp Itanium 2based solutions Launch PowerPoint PPT Presentation

hp Itanium 2based solutions Launch - Press Presentation ... press presentation small file v1.3 hp restricted june 21, 2002 non ... press presentation small file v1.3 hp restricted june ... | PowerPoint PPT presentation | free to view

Nektarios Paisios. PowerPoint PPT Presentation

Nektarios Paisios. - Study: 'The cost is not for the processor but for the memory' ... extra flag bit is used to indicate wether a value is frequent and so, although ... | PowerPoint PPT presentation | free to view

OpenVMS Performance Update PowerPoint PPT Presentation

OpenVMS Performance Update - The amount of CPU time required per IO tends to be smaller on Integrity (fibre channel and lan) ... Unwinding the stack is also a very compute intensive operation ... | PowerPoint PPT presentation | free to view

University of British Columbia Cpsc 318 Computer Structures Lecture 1 Introduction PowerPoint PPT Presentation

University of British Columbia Cpsc 318 Computer Structures Lecture 1 Introduction - Computer Technology - Dramatic Change! Processor. 2X in speed every 1.5 years ... Computer Technology - Dramatic Change! State-of-the-art PC when you graduate: ... | PowerPoint PPT presentation | free to view

UPC: A Portable High Performance Dialect of C PowerPoint PPT Presentation

UPC: A Portable High Performance Dialect of C - CPU: x86, Itanium, Opteron, Alpha, Power3/4, SPARC, PA-RISC, MIPS. OS: Linux, Solaris, AIX, Tru64, Unicos, FreeBSD, IRIX, HPUX, Cygwin, MacOS ... | PowerPoint PPT presentation | free to view

Linear Algebra Libraries PowerPoint PPT Presentation

Linear Algebra Libraries - ... more participants (not all!) Jack Dongarra, U. Tennessee. Kathy Yelic, ... Using a red-black algorithm, titanium arrays (191 Mflops) are faster than Java arrays ... | PowerPoint PPT presentation | free to view

Parallel - Title: Parallel & Distributed Computing Seminar (ICS691) Author: Henri Casanova Last modified by: jmunoz1 Created Date: 5/13/2005 2:20:40 PM Document presentation format | PowerPoint PPT presentation | free to view

CompSci 100E Dietolf (Dee) Ramm PowerPoint PPT Presentation

CompSci 100E Dietolf (Dee) Ramm - Dietolf (Dee) Ramm http://www.cs.duke.edu/courses/cps100e/spring06 http://www.cs.duke.edu/~dr | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP - Slides by Jim Demmel, David Culler, Horst Simon, and Erich Strohmaier | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - ... Shared Memory Program is a collection of threads of control. Can be created dynamically, mid-execution, in some languages Each thread has a set of private ... | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 - Slides by Jim Demmel and Kathy Yelick ... Threads and OpenMP Lecture 6 James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr13/ | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 - Title: Shared Memory Parallel Programming Author: Kathy Yelick Description: Slides by Jim Demmel and Kathy Yelick Last modified by: James Demmel Created Date | PowerPoint PPT presentation | free to view

Tools for High Performance Scientific Computing PowerPoint PPT Presentation

Tools for High Performance Scientific Computing - Titanium | PowerPoint PPT presentation | free to view