Titanium: A High Performance Java-Based Language - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Titanium: A High Performance Java-Based Language

Description:

Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Carleton Miyamoto, Geoff Pike, Luigi Semenzato, – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 25
Provided by: SusanL98
Category:

less

Transcript and Presenter's Notes

Title: Titanium: A High Performance Java-Based Language


1
Titanium A High Performance Java-Based Language
Katherine Yelick Alex Aiken, Phillip Colella,
David Gay, Susan Graham, Paul Hilfinger, Arvind
Krishnamurthy, Ben Liblit, Carleton Miyamoto,
Geoff Pike, Luigi Semenzato,
2
Talk Outline
  • Motivation
  • Extensions for uniprocessor performance
  • Extensions for parallelism
  • A framework for domain-specific languages
  • Status and performance

3
Programming Challenges on Millennium
  • Large scale computations
  • Optimized simulation algorithms are complex
  • Use of hierarchical parallel machine
  • Cost-conscious programming

Minimization algorithms
Unstructured meshes
?
Adaptive meshes
4
Titanium Approach
  • Performance is primary goal
  • High uniprocessor performance
  • Designed for shared and distributed memory
  • Parallelism constructs with programmer control
  • Optimizing compiler for caches, communication
    scheduling, etc.
  • Expressiveness secondary goal
  • Based on safe language Java
  • Safety simplifies programming and compiler
    analysis
  • Framework for domain-specific language extensions

5
New Language Features
  • Immutable classes
  • Multidimensional arrays
  • also points and index sets as first-class values
  • multidimensional iterators
  • Memory management
  • semi-automated zone-based allocation
  • Scalable parallelism
  • SPMD model of execution with global address space
  • Language-level synchronization
  • Support for grid-based computation

6
Java Objects
  • Primitive scalar types boolean, double, int,
    etc.
  • access is fast
  • Objects user-defined and from the standard
    library
  • has level of indirection (pointer to) implicit
  • arrays are objects
  • all objects can be checked for equality and a few
    other operations

3 true
r 7.1 i 4.3
7
Immutable Classes in Titanium
  • For small objects, would sometimes prefer
  • to avoid level of indirection
  • pass by value
  • extends the idea of primitive values (1, 4.2,
    etc.) to user-defined values
  • Titanium introduces immutable classes
  • all fields are final (implicitly)
  • cannot inherit from (extend) or be inherited by
    other classes
  • needs to have 0-argument constructor, e.g.,
    Complex ()
  • immutable class Complex ...
  • Complex c new Complex(7.1, 4.3)

8
Arrays in Java
  • Arrays in Java are objects
  • Only 1D arrays are directly supported
  • Array bounds are checked (as in Fortran)
  • Multidimensional arrays as arrays of arrays are
    slow and cannot transform into contiguous memory

9
Titanium Arrays
  • Fast, expressive arrays
  • multidimensional
  • lower bound, upper bound, stride
  • concise indexing Ap instead of A(i, j, k)
  • Points
  • tuple of integers as primitive type
  • Domains
  • rectangular sets of points (bounds and stride)
  • arbitrary sets of points
  • Multidimensional iterators

10
Example Point, RectDomain, Array
Pointlt2gt lb 1, 1 Pointlt2gt ub 10,
20 RectDomainlt2gt R lb ub 2, 2 double
2d A new doubleR foreach (p in
A.domain()) Ap B2 p
  • Standard optimizations
  • strength reduction
  • common subexpression elimination
  • invariant code motion
  • removing bounds checks from body

11
Memory Management
  • Java implemented with garbage collection
  • Distributed GC too unpredictable
  • Compile-time analysis can improve performance
  • Zone-based memory management
  • extends existing model
  • good performance
  • safe
  • easy to use

12
Zone-Based Memory Management
  • Allocate objects in zones
  • Release zones manually

Z1
Zone Z1 new Zone()
Zone Z2 new Zone()
T x new(Z1) T()
x
T y new(Z2) T()
x.field y
x y
Z2
delete Z1
y
delete Z2 // error
13
Sequential Performance
Times in seconds (lower is better).
14
Sequential Performance
On an Ultrasparc
C/C/ FORTRAN
Java Arrays
Titanium Arrays
Overhead
DAXPY
1.4s
7
1.5s
6.8s
3D multigrid
12s
83
22s
2D multigrid
5.4s
15
6.2s
EM3D
0.7s
1.8s
1.0s
42
On a Pentium II
C/C/ RTFORAN
Java Arrays
Titanium Arrays
Overhead
DAXPY
1.8s
27
2.3s
3D multigrid
23.0s
-13
20.0s
2D multigrid
7.3s
-25
5.5s
EM3D
1.0s
1.6s
60
15
Model of Parallelism

n processes
  • Single Program, Multiple Data
  • fixed number of processes
  • each process has own local data
  • global synchronization (barrier)

start
...
barrier
...
barrier
...
...
barrier
...
end
16
Global Address Space
  • Each process has its own heap
  • References can span process boundaries

Other processes
Process 0
LOCAL HEAP
LOCAL HEAP
Class T T gv T lv null if
(thisProc() 0) lv new T() //
allocate locally gv broadcast lv from 0
// distribute gv.field ...
17
Global vs. Local References
  • Global references may be slow
  • distributed memory overhead of a few
    instructions when using a global reference to
    access a local object
  • shared memory no performance implications
  • Solution use local qualifier
  • statically restrict references to local objects
  • example T local lv null
  • use only in critical sections

18
Global Synchronization Analysis
  • In Titanium, processes must synchronize at the
    same textual instances of barrier()

doThis() barrier() boolean x
someCondition() if (x) doThat()
barrier() doSomeMore() barrier()
19
Global Synchronization Analysis
  • In Titanium, processes must synchronize at the
    same textual instances of barrier()
  • Singleness analysis statically guarantees
    correctness by restricting the values of
    variables that control program flow

doThis() barrier() boolean single x
someCondition() if (x) doThat()
barrier() doSomeMore() barrier()
20
Support for Grid-Based Computation
R
Pointlt2gt lb 0, 0 Pointlt2gt ub 6,
4 RectDomainlt2gt R lb ub 2,
2 Domainlt2gt red R (R 1, 1) foreach
(p in red)
(6, 4)
(0, 0)
R 1, 1
(7, 5)
(1, 1)
red
(7, 5)
Gauss-Seidel relaxation with red-black ordering
(0, 0)
21
Implementation
  • Strategy
  • compile Titanium into C (currently C)
  • Posix threads for SMPs (currently Solaris
    threads)
  • Lightweight Active Messages for communication
  • Status
  • runs on SUN Enterprise 8-way SMP
  • runs on Berkeley NOW
  • trivial ports to 1/2 dozen other architectures
  • tuning for sequential performance

22
Titanium Status
  • Titanium language definition complete.
  • Titanium compiler running.
  • Compiles for uniprocessors, NOW others soon.
  • Application developments ongoing.
  • Many research opportunities.

23
Applications
  • Three-D AMR Poisson Solver (AMR3D)
  • block-structured grids with multigrid computation
    on each
  • 2000 line program
  • algorithm not yet fully implemented in other
    languages
  • tests performance and effectiveness of language
    features
  • Three-D Electromagnetic Waves (EM3D)
  • unstructured grids
  • Several smaller benchmarks

24
Parallel Performance
  • Numbers from Ultrasparc SMP
  • Parallel efficiency good
  • EM3D (unstructured kernel)
  • 3D AMR limited by algorithm

Speedup
Number of processors
25
New Compiler Analyses for Parallelism
  • Analysis of synchronization
  • finds unmatched barriers, parallel code blocks
  • extends traditional control flow analysis
  • Analysis of communication
  • reorder and pipeline memory operations without
    observed effect
  • extends traditional dependence analysis
  • Analyses extended to domain-specific constructs
  • arrays indexed by domains of points
  • looping constructs provide summarize information

26
Future Directions
  • Use of framework for domain-specific languages
  • Fluids and AMR done
  • Unstructured meshes and sparse solvers
  • Better programming tools
  • debuggers, performance analysis
  • Optimizations
  • analysis of parallel code and synchronization
    done
  • optimizations for caches on uniprocessors and
    SMPs underway
  • load balancing on clusters of SMPs

27
Conclusions
  • Performance
  • sequential performance consistently close to
    C/FORTRAN
  • currently 80 slower to 25 faster
  • sequential efficiency very high
  • Expressiveness
  • safety of Java with small set of performance
    features
  • extensible to new application domains
  • Portability, compatibility, etc.
  • no gratuitous departures from Java standard
  • compilation model easily supports new platforms
Write a Comment
User Comments (0)
About PowerShow.com