Titanium: A HighLevel Parallel Language - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Titanium: A HighLevel Parallel Language

Description:

Several optimizations in Titanium compiler (tc) over the past year ... Uses dynamic load balancing within Titanium. Heart Simulation ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 31

Provided by: kath218

Category:

more less

Transcript and Presenter's Notes

Title: Titanium: A HighLevel Parallel Language

1
Titanium A High-Level Parallel Language

Kathy Yelick
University of California, Berkeley and
Lawrence Berkeley National Laboratory

2
Global Address Space Languages

Explicitly parallel model with SPMD parallelism
Fixed at program start-up, typically 1 thread per
processor
Global address space model of memory
Allows programmer to directly represent
distributed data structures
Address space is logically partitioned
Local vs. remote memory (two-level hierarchy)
Programmer control over performance critical
decisions
Data layout and communication
Performance transparency and tunability are goals
Initial implementation can use fine-grained
shared memory

3
Global Address Space
X0
X1
XP
Shared
Global address space
ptr
ptr
ptr
Private

Global address space abstraction
Shared memory is partitioned by processors
Remote memory may stay remote no automatic
caching implied
One-sided communication through reads/writes of
shared variables
Less restricted than MPI-2 one-sided model
Both individual and bulk memory copies

4
Three Related Languages

Unified Parallel C (UPC)
Follow-on to Split-C, AC, and PCP research
languages
UPC supported on Cray and HP machines
Open source compilers from Intrepid and LBNL
Co-Array Fortran (CAF)
Based on Fortran 90
Supported on Cray machines
Open source compiler under development at Rice
Titanium
Based on Java
Open source compiler from U.C. Berkeley

5
Titanium

Based on Java, a cleaner C
classes, automatic memory management, etc.
compiled to C and then native binary (no JVM)
Same parallelism model as UPC and CAF
SPMD with a global address space
Dynamic Java threads are not supported
Optimizing compiler
static (compile-time) optimizer, not a JIT
communication and memory optimizations
synchronization analysis (e.g. static barrier
analysis)
cache and other uniprocessor optimizations

6
Summary of Features Added to Java

Scalable parallelism (Java threads replaced)
Immutable (value) classes
Multidimensional arrays with unordered iteration
Checked Synchronization
Operator overloading
Templates
Zone-based memory management (regions)
Libraries for collective communication,
distributed arrays, bulk I/O

7
Immutable Classes in Titanium

For small objects, would sometimes prefer
to avoid level of indirection
pass by value (copy entire object)
especially when immutable -- fields never
modified
Example
immutable class Complex
Complex () real0 imag0
Complex operator (Complex c) ...
Complex c1 new Complex(7.1, 4.3)
c1 c1 c1
Addresses performance and programmability
Similar to structs in C (not C classes) in
terms of performance
Adds support for complex types

8
Multidimensional Arrays in Titanium

Index set is a a domain with rich set of
operators
Unordered iteration over domains helps
optimization

aInterior a.restrict(2)
foreach (p in aInterior.domain())
aInteriorp
a
n1,n1 2n,2n
0,0n,n
b
b.copy(aInterior)
9
Titanium Compiler Status

Titanium compiler runs on almost any machine
Requires a C compiler (and decent C to compile
translator)
Pthreads for shared memory
Communication layer for distributed memory (or
hybrid)
Recently moved to live on GASNet shared with UPC
Obtained GM, Elan, and improved LAPI
implementation
Recent language extensions
Indexed array copy (scatter/gather style)
Non-blocking array copy under development
Compiler optimizations
Cache optimizations, for loop optimizations
Communication optimizations for overlap,
pipelining, and scatter/gather under development

10
Serial Performance (Pure Java)

Several optimizations in Titanium compiler (tc)
over the past year
These codes are all written in pure Java without
performance extensions

11
Communication Optimizations

Possible communication optimizations
Communication overlap, aggregation, caching
Effectiveness varies by machine

12
Parallel Performance and Scalability

Poisson solver using Method of Local
Corrections Balls, Colella
Communication (flat)
IBM SP
Cray T3E

13
NAS MG in Titanium

Preliminary Performance for MG code on IBM SP
Speedups are nearly identical
About 25 serial performance difference

14
Applications in Titanium

Several benchmarks
Fluid solvers with Adaptive Mesh Refinement (AMR)
Conjugate Gradient
3D Multigrid
Unstructured mesh kernel EM3D
Dense linear algebra LU, MatMul
Tree-structured n-body code
Finite element benchmark
Genetics micro-array selection
SciMark serial benchmarks
Larger applications
Heart simulation
Ocean modeling with AMR (in progress)

15
AMR for Ocean Modeling

Ocean Modeling Wen, Colella
Require embedded boundaries to model
floor/coastline
Line vs. point relaxation for aspect ratio
1000km x 10km
Result in irregular data structures and array
accesses
Currently developing
Basin scale AMR circulation model
Initially non-adaptive
Compiler and language support design

Graphics from Titanium AMR Gas Dynamics
McCorquodale,Colella
16
Simulating Fluid Flow in Biological Systems

Immersed Boundary Method Peskin/MacQueen
Material (e.g., heart muscles, cochlea structure)
modeled by grid of material points
Fluid space modeled by a regular lattice
Irregular material points need to interact with
regular fluid lattice
Trade-off between load balancing of fibers and
minimizing communication
Memory and communication intensive

Random array access is key problem in the
performance
Developed compiler optimizations to improve their
performance

17
Titanium Group

Susan Graham
Katherine Yelick
Paul Hilfinger
Phillip Colella (LBNL)
Alex Aiken
Dan Bonachea
Kaushik Datta
Ed Givelberg
Sabrina Merchant
Szu-Huey Chuang
Carol Ho
Jimmy Su

Greg Balls (SDSC)
Peter McQuorquodale (LBNL)
Andrew Begel
Tyson Condie
Carrie Fei
David Gay
Ben Liblit
Chang Sun Lin
Geoff Pike
Ellen Tsai
Mike Welcome (LBNL)
Siu Man Yau

The End
http//upc.nersc.gov
http//titanium.cs.berkeley.edu/

19
Target Problems

Many modeling problems in astrophysics, biology,
material science, and other areas require
Enormous range of spatial and temporal scales
Requires
Adaptive methods
Large scale parallel machines
Titanium supports
Stuctured grids
Locally-structured grids (AMR)
Unstructured grids (in progress)

20
Java Compiled by Titanium Compiler
21
Java Compiled by Titanium Compiler
22
Parallel Applications

Genome Application
Heart simulation
AMR elliptic and hyperbolic solvers
Scalable Poisson for infinite domains
Genome application
Several smaller benchmarks EM3D, MatMul, LU,
FFT, Join

23
MOOSE Application

Problem Microarray construction
Used for genome experiments
Possible medical applications long-term
Microarray Optimal Oligo Selection Engine (MOOSE)
A parallel engine for selecting the best
oligonucleotide sequences for genetic microarray
testing
Uses dynamic load balancing within Titanium

24
Heart Simulation

Problem compute blood flow in the heart
Modeled as an elastic structure in an
incompressible fluid.
The immersed boundary method Peskin and
McQueen.
20 years of development in model
Many other applications blood clotting, inner
ear, paper making, embryo growth, and more
Can be used for design
of prosthetics
Artificial heart valves
Cochlear implants

25
Scalable Poisson Solver