CrossArchitectural Performance Portability of a Java Virtual Machine Implementation

1 / 25
About This Presentation
Title:

CrossArchitectural Performance Portability of a Java Virtual Machine Implementation

Description:

State-of-the-art implementation of JVM on Alpha. Real 64-bit implementation ... A single x86 instruction comprises several Alpha instructions. Different ... –

Number of Views:22
Avg rating:3.0/5.0
Slides: 26
Provided by: matthia2
Category:

less

Transcript and Presenter's Notes

Title: CrossArchitectural Performance Portability of a Java Virtual Machine Implementation


1
Cross-Architectural Performance Portability of a
Java Virtual Machine Implementation
  • Matthias JacobPrinceton University

Keith RandallGoogle, Inc.
2
JVM architecture
Java Bytecode
Interpreter
JIT
Native Code
JVM
CPU
3
JVM architecture
Java Bytecode
Interpreter
JIT
Native Code
JVM
CPU
4
Compaq FastVM
  • State-of-the-art implementation of JVM on Alpha
  • Real 64-bit implementation
  • Efficient optimization mechanisms
  • Not feedback-based (as HotSpot)
  • Can we port the code generator to x86 and
    preserve the performance ?

5
Differences Alpha x86
  • Reduced number of registers
  • 8 registers on x86 versus 31 on Alpha
  • Instructions contain multiple operations
  • A single x86 instruction comprises several Alpha
    instructions
  • Different addressing modes
  • Arithmetic x86 instructions operate on memory
    directly
  • Non-orthogonality of instruction set
  • Different registers require different
    instructions
  • Source registers get overwritten
  • Operand registers are used to store results on x86

6
Outline
  • Modified Optimizations for x86
  • Register Allocation
  • Instruction Selection
  • Instruction Patching
  • Method Inlining
  • New Optimizations for x86
  • Calling Convention
  • Floating-Point Modes
  • Results
  • Conclusion

7
Register Allocation for JIT
  • Traditional optimal register allocation too
    expensive
  • Graph coloring
  • Use heuristics
  • LMAP structure

8
Register Allocation
  • Java entities Local variables Lx and Java stack
    locations S(y)
  • Assign every Java entity home location H
  • Temporary location T for intermediate results

9
Register Allocation
  • Limited amount of registers
  • Flexible partitioning H- / T-registers
  • No dedicated registers
  • Thread-local pointer in segment register

10
Register Allocation
  • Instructions limited to certain registers
  • Allocate only subset of registers

11
Register Allocation
  • Memory locations as arguments
  • Pick different addressing mode instead of
    allocating register

12
Register Allocation Speedup
13
Instruction Selection
  • Alpha/RISC
  • ALU operations
  • Memory operations
  • Control operations
  • x86/CISC
  • Instructions can be combined ALU/Memory/Control
    operations
  • Different addressing modes
  • Limited set of registers per instruction
  • Emulate 64-bit operations
  • Floating-point stack

14
Instruction Patching
  • Patching instructions
  • Class initializers
  • Fix up branches
  • Copying registers
  • Method Inlining
  • Needs to be atomic because of concurrency
  • Alpha Every instruction is 4 bytes
  • single write instruction sufficient

15
Instruction Patching on x86
  • Different instruction lengths
  • Patch instructions atomically using
    Compare-and-Exchange
  • Pad with NOPs
  • Difficult to walk back in code for renaming
    registers (as on Alpha)
  • Input registers are often output registers
  • Renaming output registers alone is not sufficient
  • Retargeting by forward-looking heuristic
  • Look for nearest future use of a preferred
    register

16
Method Inlining Speedup
17
Outline
  • Modified Optimizations for x86
  • Register Allocation
  • Instruction Selection
  • Instruction Patching
  • Method inlining
  • New Optimizations for x86
  • Calling Convention
  • Floating-Point Modes
  • Results
  • Conclusion

18
Optimizations for x86
  • Calling Convention on x86
  • Argument passing on stack instead of registers
  • Allocate registers for argument passing
  • Two registers for stack management Frame
    pointer and Stack pointer
  • Constant stack frame size
  • Detection of stack overflow is difficult
  • Check at bottom of stack frame in method prolog
  • 8-byte stack operations may be unaligned
  • Align stack frames to 8 byte boundaries

19
Optimized stack frame layout
Input arguments

Return address
Callee-save space

Local variables

Output stack arguments

Callee-save space (4 bytes)
esp
Method prolog
Method epilog
  • subl 24, esp
  • movl ebx, (esp)

movl (esp), ebx addl 24, esp ret
20
Floating-Point Modes
  • Alpha
  • Floating-point precision is encoded in
    instruction
  • x86
  • Toggle floating-point precision explicitly
  • Heuristically find default setting
  • Reduce number of toggles

21
Floating point speedup
22
Overall speedup
23
Results
Average scenario
24
Results
Best-case scenario
25
Conclusion
  • FastVM port to x86 is competitive
  • Fastest JVM implementation on javac and jack
  • Minimal effort on optimizations
  • Pitfalls, but also advantages
  • Instruction selection on x86
  • Generally easier to generate efficient code for
    RISC
  • More architecture-neutral optimizations possible
  • Register allocation
Write a Comment
User Comments (0)
About PowerShow.com