Titanium: A HighLevel Parallel Language - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Titanium: A HighLevel Parallel Language

Description:

Several optimizations in Titanium compiler (tc) over the past year ... Uses dynamic load balancing within Titanium. Heart Simulation ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 31
Provided by: kath218
Category:

less

Transcript and Presenter's Notes

Title: Titanium: A HighLevel Parallel Language


1
Titanium A High-Level Parallel Language
  • Kathy Yelick
  • University of California, Berkeley and
  • Lawrence Berkeley National Laboratory

2
Global Address Space Languages
  • Explicitly parallel model with SPMD parallelism
  • Fixed at program start-up, typically 1 thread per
    processor
  • Global address space model of memory
  • Allows programmer to directly represent
    distributed data structures
  • Address space is logically partitioned
  • Local vs. remote memory (two-level hierarchy)
  • Programmer control over performance critical
    decisions
  • Data layout and communication
  • Performance transparency and tunability are goals
  • Initial implementation can use fine-grained
    shared memory

3
Global Address Space
X0
X1
XP
Shared
Global address space
ptr
ptr
ptr
Private
  • Global address space abstraction
  • Shared memory is partitioned by processors
  • Remote memory may stay remote no automatic
    caching implied
  • One-sided communication through reads/writes of
    shared variables
  • Less restricted than MPI-2 one-sided model
  • Both individual and bulk memory copies

4
Three Related Languages
  • Unified Parallel C (UPC)
  • Follow-on to Split-C, AC, and PCP research
    languages
  • UPC supported on Cray and HP machines
  • Open source compilers from Intrepid and LBNL
  • Co-Array Fortran (CAF)
  • Based on Fortran 90
  • Supported on Cray machines
  • Open source compiler under development at Rice
  • Titanium
  • Based on Java
  • Open source compiler from U.C. Berkeley

5
Titanium
  • Based on Java, a cleaner C
  • classes, automatic memory management, etc.
  • compiled to C and then native binary (no JVM)
  • Same parallelism model as UPC and CAF
  • SPMD with a global address space
  • Dynamic Java threads are not supported
  • Optimizing compiler
  • static (compile-time) optimizer, not a JIT
  • communication and memory optimizations
  • synchronization analysis (e.g. static barrier
    analysis)
  • cache and other uniprocessor optimizations

6
Summary of Features Added to Java
  • Scalable parallelism (Java threads replaced)
  • Immutable (value) classes
  • Multidimensional arrays with unordered iteration
  • Checked Synchronization
  • Operator overloading
  • Templates
  • Zone-based memory management (regions)
  • Libraries for collective communication,
    distributed arrays, bulk I/O

7
Immutable Classes in Titanium
  • For small objects, would sometimes prefer
  • to avoid level of indirection
  • pass by value (copy entire object)
  • especially when immutable -- fields never
    modified
  • Example
  • immutable class Complex
  • Complex () real0 imag0
  • Complex operator (Complex c) ...
  • Complex c1 new Complex(7.1, 4.3)
  • c1 c1 c1
  • Addresses performance and programmability
  • Similar to structs in C (not C classes) in
    terms of performance
  • Adds support for complex types

8
Multidimensional Arrays in Titanium
  • Index set is a a domain with rich set of
    operators
  • Unordered iteration over domains helps
    optimization

aInterior a.restrict(2)
foreach (p in aInterior.domain())
aInteriorp
a
n1,n1 2n,2n
0,0n,n
b
b.copy(aInterior)
9
Titanium Compiler Status
  • Titanium compiler runs on almost any machine
  • Requires a C compiler (and decent C to compile
    translator)
  • Pthreads for shared memory
  • Communication layer for distributed memory (or
    hybrid)
  • Recently moved to live on GASNet shared with UPC
  • Obtained GM, Elan, and improved LAPI
    implementation
  • Recent language extensions
  • Indexed array copy (scatter/gather style)
  • Non-blocking array copy under development
  • Compiler optimizations
  • Cache optimizations, for loop optimizations
  • Communication optimizations for overlap,
    pipelining, and scatter/gather under development

10
Serial Performance (Pure Java)
  • Several optimizations in Titanium compiler (tc)
    over the past year
  • These codes are all written in pure Java without
    performance extensions

11
Communication Optimizations
  • Possible communication optimizations
  • Communication overlap, aggregation, caching
  • Effectiveness varies by machine

12
Parallel Performance and Scalability
  • Poisson solver using Method of Local
    Corrections Balls, Colella
  • Communication (flat)
  • IBM SP
    Cray T3E

13
NAS MG in Titanium
  • Preliminary Performance for MG code on IBM SP
  • Speedups are nearly identical
  • About 25 serial performance difference

14
Applications in Titanium
  • Several benchmarks
  • Fluid solvers with Adaptive Mesh Refinement (AMR)
  • Conjugate Gradient
  • 3D Multigrid
  • Unstructured mesh kernel EM3D
  • Dense linear algebra LU, MatMul
  • Tree-structured n-body code
  • Finite element benchmark
  • Genetics micro-array selection
  • SciMark serial benchmarks
  • Larger applications
  • Heart simulation
  • Ocean modeling with AMR (in progress)

15
AMR for Ocean Modeling
  • Ocean Modeling Wen, Colella
  • Require embedded boundaries to model
    floor/coastline
  • Line vs. point relaxation for aspect ratio
    1000km x 10km
  • Result in irregular data structures and array
    accesses
  • Currently developing
  • Basin scale AMR circulation model
  • Initially non-adaptive
  • Compiler and language support design

Graphics from Titanium AMR Gas Dynamics
McCorquodale,Colella
16
Simulating Fluid Flow in Biological Systems
  • Immersed Boundary Method Peskin/MacQueen
  • Material (e.g., heart muscles, cochlea structure)
    modeled by grid of material points
  • Fluid space modeled by a regular lattice
  • Irregular material points need to interact with
    regular fluid lattice
  • Trade-off between load balancing of fibers and
    minimizing communication
  • Memory and communication intensive
  • Random array access is key problem in the
    performance
  • Developed compiler optimizations to improve their
    performance

17
Titanium Group
  • Susan Graham
  • Katherine Yelick
  • Paul Hilfinger
  • Phillip Colella (LBNL)
  • Alex Aiken
  • Dan Bonachea
  • Kaushik Datta
  • Ed Givelberg
  • Sabrina Merchant
  • Szu-Huey Chuang
  • Carol Ho
  • Jimmy Su
  • Greg Balls (SDSC)
  • Peter McQuorquodale (LBNL)
  • Andrew Begel
  • Tyson Condie
  • Carrie Fei
  • David Gay
  • Ben Liblit
  • Chang Sun Lin
  • Geoff Pike
  • Ellen Tsai
  • Mike Welcome (LBNL)
  • Siu Man Yau

18
  • The End
  • http//upc.nersc.gov
  • http//titanium.cs.berkeley.edu/

19
Target Problems
  • Many modeling problems in astrophysics, biology,
    material science, and other areas require
  • Enormous range of spatial and temporal scales
  • Requires
  • Adaptive methods
  • Large scale parallel machines
  • Titanium supports
  • Stuctured grids
  • Locally-structured grids (AMR)
  • Unstructured grids (in progress)

20
Java Compiled by Titanium Compiler
21
Java Compiled by Titanium Compiler
22
Parallel Applications
  • Genome Application
  • Heart simulation
  • AMR elliptic and hyperbolic solvers
  • Scalable Poisson for infinite domains
  • Genome application
  • Several smaller benchmarks EM3D, MatMul, LU,
    FFT, Join

23
MOOSE Application
  • Problem Microarray construction
  • Used for genome experiments
  • Possible medical applications long-term
  • Microarray Optimal Oligo Selection Engine (MOOSE)
  • A parallel engine for selecting the best
    oligonucleotide sequences for genetic microarray
    testing
  • Uses dynamic load balancing within Titanium

24
Heart Simulation
  • Problem compute blood flow in the heart
  • Modeled as an elastic structure in an
    incompressible fluid.
  • The immersed boundary method Peskin and
    McQueen.
  • 20 years of development in model
  • Many other applications blood clotting, inner
    ear, paper making, embryo growth, and more
  • Can be used for design
    of prosthetics
  • Artificial heart valves
  • Cochlear implants

25
Scalable Poisson Solver
  • MLC for Finite-Differences by Balls and Colella
  • Poisson equation with infinite boundaries
  • arise in astrophysics, some biological systems,
    etc.
  • Method is scalable
  • Low communication
  • Performance on
  • SP2 (shown) and t3e
  • scaled speedups
  • nearly ideal (flat)
  • Currently 2D and non-adaptive

26
Error on High-Wavenumber Problem
  • Charge is
  • 1 charge of concentric waves
  • 2 star-shaped charges.
  • Largest error is where the charge is changing
    rapidly. Note
  • discretization error
  • faint decomposition error
  • Run on 16 procs

27
AMR Gas Dynamics
  • Developed by McCorquodale and Colella
  • 2D Example (3D supported)
  • Mach-10 shock on solid surface
    at
    oblique angle
  • Future Self-gravitating gas dynamics package

28
Unstructured Mesh Kernel
  • EM3D Relaxation on a 3D unstructured mesh
  • Speedup on Ultrasparc SMP
  • Simple kernel mesh not partitioned.

29
Recent Developments
  • Interfaces to libraries
  • KeLP and (older) PETSc and Metis
  • New IBM SP implementation
  • Uses LAPI rather than MPI, about 2x performance
    gain
  • New release IBM, SGI, Cray, Linux cluster,
    Threads,
  • Uniprocessor optimizations
  • Method inlining, both automated and manual
  • Cache optimizations
  • Shared pointer analysis
  • Support for unstructured computation
  • General sub-array copy now with arbitrary points

30
Future Plans
  • Merge communication layer with UPC
  • Unified Parallel C has broad vendor support.
  • Uses some execution model as Titanium
  • Automated communication overlap
  • Analysis and refinement of cache optimizations
  • Additional support for unstructured grids
  • Conjugate gradient and particle methods are
    motivations
  • Better uniprocessor optimizations, possibly new
    arrays
Write a Comment
User Comments (0)
About PowerShow.com