Applications, scalability, and technological change

About This Presentation

Title:

Applications, scalability, and technological change

Description:

New Methods for Developing Peta-scalable Codes. 6 ... New Methods for Developing Peta-scalable Codes. 7. Infinite domain BCs in practice ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 50

Provided by: PSC47

Category:

more less

Transcript and Presenter's Notes

Title: Applications, scalability, and technological change

1
Applications, scalability, and technological
change

Scott B. Baden, Gregory T. Balls
Dept. of Computer Science and Engineering - UCSD
Phillip Colella
Advanced Numerical Algorithms Group - LBNL

2
Asynchronous computation and a data-driven model
of execution

Scott B. Baden
Dept. of Computer Science and Engineering
University of California, San Diego

3
Motivation

Petascale architectures motivate the design of
new algorithms and programming models to cope
with technological evolution
The growing processor memory gap, which
continues to raise the cost of communication
Amdahls law amplifies the cost of resource
contention
Reformulate the algorithm to
Reduce the amount of communication
Reduce the cost

4
Motivating applications

Elliptic solvers
High communication overheads due to global
coupling
Low ratio of flops-to-mems
Asynchronous algorithms
Brownian dynamics for cell microphysiology
Dynamic data assimilation

5
Roadmap

SCALLOP
A highly scalable, infinite domain Poisson solver
Written in KeLP
Asynchronous algorithms with Tarragon
Non BSP programming model
Communication overlap

6
Infinite Domain Poisson Equation

SCALLOP is an elliptic solver for constant
coefficient problems in 3D
Free space boundary conditions
We consider the Poisson equation ??? ?(x,y,z)
with infinite domain boundary conditions
R is the total charge

7
Infinite domain BCs in practice

Infinite domain BCs arise in various applications
Modeling the human heart Yelick, Peskin, and
McQueen
Astrophysics Colella et al.
Computing infinite domain boundary conditions is
expensive, especially on a parallel computer
Alternatives
Extending the domain
Periodic boundary conditions

8
Elliptic regularity

The Poisson equation ??? ?(x,y,z)
Lets assume that? f(x,y,z) for (x,y,z) ? ?
? is the set of points where ? ?0 supp(?)
The solution ? ? C? outside of ?
We can represent ? at a lower numerical
resolution outside ? than we can inside ?

D
?
9
Elliptic regularity

Superposition and linearity
We can divide D into D 1 ? D 2 ? ? D n
D
To get the solution over D, we sum the solutions
over the D i due to the charges ?i in each D i
The solution ?i ? C? outside ?i
We can represent ? at a lower numerical
resolution outside ? than we can inside ?

D
?
D i
10
SCALLOP

Exploits elliptic regularity to reduce
communication costs significantly
Barnes-Hut (1986), Andersons MLC (1986), FMM
(1987), Bank-Holst (2000), Balls and Colella
(2002)
Our contribution extension of these ideas to
finite difference problems in three dimensions

11
Domain Decomposition Strategy

Divide problem into subdomains
Use a reduced description of far-field effects
Stitch solutions together

12
Comparison with TraditionalDomain Decomposition
Methods

E.g. Smith and Widlund
Multiple iterations between local and nonlocal
domains
Multiple communication steps
SCALLOP employs a fixed number (2) of
communication steps

13
Comparison with TraditionalDomain Decomposition
Methods

Construct a dense linear system for degrees of
freedom on the boundaries between subdomains
using a Schur complement(Smith and Widlund)
Multiple iterations between local and nonlocal
domains
Multiple communication steps
SCALLOP employs a fixed number of communication
steps

14
SCALLOP in Context

Finite element methods
Bank-Holst (2000)
Particle Methods
Fast Multipole Method Greengard and Rokhlin,
1987
Users pay a computational premium in exchange for
parallelism
Method of Local Corrections Anderson, 1986
Not well-suited to finite-difference
calculations difficult to generate suitable
derivatives

15
Domain Decomposition Definitions

N3 is the global problem size
Divided into q3 subdomains
proc must divide q3 evenly
(N/q)3 local mesh of size
(N/C)3 Coarse mesh of size
C is the coarsening factor
In this 2-D slice N16, q2, and C4
163 mesh split over 8 proc
Local mesh 83
Coarse mesh 43

16
The Scallop Domain Decomposition Algorithm

Five step algorithm, 2 communication steps
Serial building blocks -
Dirichlet solver (FFTW)
Infinite domain solver (built on the Dirichlet
solver)
Two complete Dirichlet solutions on slightly
enlarged domains
Infinite domain boundary calculation consumes
most of the running time

17
Domain Decomposition Algorithm

1. On each subdomain, solve an infinite domain
problem, ignoring all other subdomains, and
create a coarse representation of the charge.
O((N/q)3) parallel running time, no
communication

18
Domain Decomposition Algorithm

2. Aggregate all the coarse charge fields into
one global charge.
O((Nq/C)3), all-to-all communication

19
Domain Decomposition Algorithm

3. Calculate the global infinite domain
solution.(Duplicate solves on all processors)
O((N/C)3) running time, no communication

20
Domain Decomposition Algorithm

4. Compute boundary conditions for final local
solveNeighbors exchange boundary data of local
solutions and combine local fine grids with
global coarse grid
O(1) running time, nearest-neighbor
communication

21
Domain Decomposition Algorithm

5. Solve a Dirichlet problem on each subdomain to
obtain the local portion of the Infinite Domain
solution O((N/q)3) running time, no
communication

22
Domain Decomposition Algorithm

1 Initial solution O((N/q)3)
2 Aggregation O((Nq/C)3)
3 Global coarse solution
O((N/C)3)
4 Local correction O((N/q)3) (less than ID
solution)
5 Final calculation O(1)

Overall O((N/q)3 (N/C)3) work, two
communication steps
23
Computational Tradeoffs

Accuracy only weakly dependent on C
Goal minimize cost of global coarse-grid solve
C 2q
Global coarse work less than 1/8 local fine work
O( (N/q)3 ((N/C)3) ) ? O( (N/q)3 )
For current implementation, large C leads to
extra local fine grid work

24
Subdomain Overlap

In order to ensure smooth solutions and accurate
interpolation, the local (fine) domains need to
overlap
The overlap is measured in coarse grid spacing
For large refinement ratios, the overlap (in
terms of fine grid points) gets very large
Here we see the domain of influence of a fine
mesh cell

25
Overheads

SCALLOP performs 3 solves on slightly enlarged
local domains
Communication in a fixed number of steps
Infinite domain BC computation performs global
communication on a reduced description of the data

26
Analytic performance model for computational
overheads

Let TSID(N) serial infinite domain solveon an
N3 mesh (step 1) ID BCs account for 92 of the
time
Global coarse-grid solve TSID(N/C)3, C2
(1/8)TSID (N)
Final solve 0.08 TSID We dont need to
compute ID BCs!
Total cost 1.2TSID
If we could reduce the cost of the ID BCs
computation to zero, the total is at worst 2.0TSID

27
Computational Overhead

Global coarse-grid solve on a grid of size
(N/C)3 small if N/C
Extra computation due to overlap of the fine-grid
domains small if C is reasonably small
Two fine-grid calculations (complete solutions,
not just smoothing steps or V-cycles)
Unavoidable, but the final Dirichlet solution is
less costly than a full infinite domain solution

28
Limitations of Current Implementation

Earlier we mentioned that we require C 2q
For interpolation in step 4, we require a border
of 2 coarse grid cells around each subdomain
To obtain those coarse values, we currently use
the local fine grid ID solution from step 1
We thus require a local mesh of size Nf,G Nf
4C
Our analytic performance model assumes that the
local extended mesh size is Nf,G 1.2 Nf
But as q grows, Nf 4C 1.2 Nf
Computational work is not strictly N3

29
Limitations of Current Implementation

How does 1.2Nf Nf 4C constrain us?
Take, as before
C 2q
1.2Nf Nf 8q
1.2 1 8q/Nf
q Nf/40
For Nf 160, q 4
Without some tradeoffs, were limited to q3 64
procs

30
Alternate Implementation

Necessary coarse grid values can be computed
during the infinite domain boundary calculation,
without calculating corresponding fine grid
values
Local domain sizes kept reasonably small
No longer Nf,G max(1.2Nf, Nf 4C),
Just Nf,G 1.2Nf
All computational costs strictly O(N3)
New serial ID solver tested, parallel
implementation underway

31
Limit of Parallelism

We are limited only by maximum coarsening factor
1 coarse cell per local domain.
Nc q or N/C q
If we take C 2q and Nf N/q, as before,
Nf 2q
Total problem size and parallelism are now a
function of local memory available
For Nf 128
q 64, q3 262,144 processors

32
Experiments

Ran on two SP systems with Power 3 CPUs
NPACIs Blue Horizon
NERSCs Seaborg
Used a serial FFT solver implemented with FFTW
Compiled with -O2, standard environment

33
Scaled Speed-up

Try to maintain constant work per processor
Number of processors, P, proportional to
N3, q3, C3
We report performance in terms of grind time
Ideally should be a constant

Tgrind T / N3
34
Results - Seaborg
Communication percent
Grind times

Grind time increases by a factor of 2.4 over a
range of 16 - 1024 processors on Seaborg.
Communication takes less than 12 of the running
time.

35
Implementation

SCALLOP was implemented with KeLP
A rapid development infrastructure for
distributed memory machines
KeLP simplifies the expression of coarse to fine
grid communication
Bookkeeping
Domains of dependence
KeLP provides useful abstractions
Set operations on geometric domains (FIDIL,
BoxLib, Titanium)
Express communication in geometric terms
Separation of concerns
KeLP is unaware of the representation of user
data structures
User is unaware of the low level details involved
in moving data

36
The KeLP Data Motion Model

User defines persistent communication objects
customized for regular section communication
Replace low level point-to-point messages with
high level geometric descriptions of data
dependences
Optimizations
Execute asynchronously to overlap with
computation
Modify the dependencies with meaning-preserving
transformations that improve performance

37
KeLPs view of communication

Communication exhibits collective behavior, even
if all pairs of processors arent communicating
The data dependencies have an intuitive geometric
structure involving regular section data motion
within a global coordinate system

38
KeLPs Structural Abstractions

Distributed patches of storage living in a global
coordinate system, each with their own origin
Geometric meta-data describing the structure of
blocked patches of data and of data dependences
A geometric calculus for manipulating the
meta-data
Unit of dependence is a regular section

39
Abstract representation

The dependence structure and the data are
abstract
KeLP doesnt say how the data are represented nor
how the data will be moved
The user provides rules to instantiate and
flatten a subspace of Zn

40
Examples

Define a grid over an irregular subset of a
bounding rectangle (Colella and van Straalen,
LBNL)
Particles
We might represent these internally with trees,
hash tables, etc.
KeLP enforces the model that we move data laying
within rectangular subspaces

41
Summing up Scallop

A philosophy for designing algorithms that
embraces technological change
Sophisticated algorithms thatreplace (expensive)
communication with (cheaper)
computation
To develop these algorithms, we need appropriate
infrastructure (KeLP is another talk)
Scaling to larger problems is underway
Reducing the effective cost of domain overlap
Reducing the cost of the infinite domain boundary
calculation
Extension to adaptive mesh refinement algorithm

42
Roadmap

SCALLOP
A highly scalable, infinite domain Poisson solver
Written in KeLP
Asynchronous algorithms with Tarragon
Non BSP programming model
Communication overlap

43
Roadmap

SCALLOP A highly scalable, infinite domain
Poisson solver written in KeLP
Asynchronous algorithms with Tarragon
Communication overlap
Monte Carlo simulation with cell microphysiology

44
Performance Robustness in the presence of
technological change

The recipe for writing high quality application
software changes over time
Either the application must be capable of
responding to change
Or it will have to be reformulated
Weve just looked a numerical technique for
dealing with approach 2
Now lets consider a non-numerical approach
Application overlapping communication with
computation

45
Canonical variants

Many techniques are aimed at enhancing memory
locality within a single address space
ATLAS Dongarra et al. 98 , PhiPack Demmel et
al. 96, Sparsity Demmel Yelick 99, FFTW
Frigo Johnson 98
Architectural Cognizance Gatlin Carter 99
DESOBLAS Beckmann and Kelly, LCPC 99 delayed
evaluation of task graphs
But the rising cost of data transfer is also a
concern
Well explore a canonical variant for overlapping
computation with interprocessor communication in
MIMD architectures

46
Whats difficult about hiding communication?

The programmer must hard code the overlap
technique into the application software
The required knowledge is beyond the experience
of many application programmers
The specific technique is sensitive to the
technology and the application, hence the code is
not robust

47
Motivating application

Iterative solver for Poissons equation in 3
dimensions
Jacobis method, 7-pt stencil
for (i,j,k) in 1N x 1N x 1N
uijk (ui-1jk
ui1jk
uij-1k
uija1k
uijk1 uijk-1)/6

48
Traditional SPMD implementation

Decompose the domain into subregions, one per
process
Transmit halo regions between processes
Compute inner region after communication completes

49
Multi-tier Computers

High opportunity cost of communication
Hierarchical organization amplifies node
performance relative to the interconnect
Trends more processors per node, faster
processors
r? DGEMM floating point rate per node, MFLOP/s
ß? peak pt - pt MPI message BW, MBYTE/s
IBM SP2/Power2SC r? 640 ß?
100
NPACI Blue Horizon r? 14,000 ß? 400
NPACI Data Star r? 48,000 ß? ?
800

50
Overlapped variant

Reformulate the algorithm
Isolate the inner region from the halo
Execute communication concurrently with
computation on the inner region
Compute on the annulus when the halo finishes

51
Overlapped code (KeLP2)

Relax(Distributed_Data X, Mover Communication)
Communication.start()
for each subdomain x in X
Update x
Communication.wait()
// Repeat over the annulus
Implemented with KeLP2 Fink 98, SC99
KeLP2 implements a message proxy to realize
overlap
It also provide hierarchical control flow

52
Performance on a 8 nodes of Blue Horizon
With KeLP2 Fink 98, SC99
732

713
655
626 (14)
HAND ST MT(8) MTV(7) MTV(7)
OPT
53
Observations

We had to hard code the overlap strategy as well
as the parallel control flow into the application
Split-phase communication, scheduling,
complicated partitioning
Optimal ordering of communication and computation
varies across generational changes in technology
The characteristic communication delays increase
relative to that of computation
The costs may be irregular
The hard coded strategy imposes unnecessary
constraints
Computation and communication are partially
ordered if you decrease the granularity of the
computation
Applications are rich in potential parallelism

54
Tarragon an alternative approach

Testbed for exploring communication tolerant
algorithms
Asynchronous task graph model of execution
Data driven departs from the traditional bulk
synchronous model
Communication and computation do not execute as
distinct phases but are coupled activities
Tolerate unpredictable or irregular task and
communication latencies

55
Data driven execution

Overdecompose the problem so that each process
owns several tasks
Construct a task graph indicating the data
dependences
A tasks suspends until the required communication
completes at which point the task is runnable
Tarragon run time system schedules runnable tasks
according to the flow of data in the task graph

56
Tarragon in Context

Data driven techniques used in DataFlow,
databases and data intensive applications (Data
Cutter, ADR)
Charm Kale 93
Parallelism expressed across object collections
by making remote method invocations (message
passing)
Global name space
Tarragon
Functions operate on local data only data motion
is explicit
Tune performance by adjusting task granularity
and by decorating the graph with performance
metadata

57
Tarragon API

We express parallelism in an abstract form
A task graph describes the partial ordering of
tasks
Vertices? computation
Edges ? dependences
A background thread called the mover-dispatcher
provides available tasks, processes completions

58
The Mover-Dispatcher

Processes incoming and outgoing communication
Determines when tasks are ready
Calls a scheduler to determine the order of ready
task execution
Labels on the taskGraph guide the scheduling
process
Processes completions
Completion handler is a user defined callback
that invokes single sided communication

59
A look inside the Run Time System

E Execution engines
M Mover/Dispatcher

NA
NB
Done
Run
3
2
1
6
Rdy
M
E
E
E
E
E
M
E
M
M
E
E
M
E
E
M
60
Benefits

Tolerate unpredictable or irregular latencies at
different scales
Communication and computation are coupled
activities rather than distinct phases
Tune slackness to improve communication
pipelining
Flexible Scheduling
Schedulers may be freely substituted, and may be
application specific Apples, Berman
Performance meta data enable us to alter the
execution order without having to change the
scheduler
Run time system optimizes execution ordering
without entailing heroic reprogramming

61
Slackness

Multiple tasks per processing module
Improve communication pipelining communication
occurs incrementally and in parallel with
computation
Tolerate irregular communication delays
Treat load balancing as a scheduling activity
(migration)

62
First steps

KeLP2 4 applications formulated for overlap
Fink and Baden 1997, Baden and Fink 1998
Quantum KeLP F. David Sacerdoti (MS 02,
SIAM03)
Overdecomposed workload
Load balancer migrates work grains between
processors

63
Summary

Asynchronous task graph execution model
Non bulk synchronous execution model
communication and computation are coupled
activities rather than distinct phases
Tolerate unpredictable or irregular task and
communication latencies
Performance meta data decorate the graph to
provide scheduling hints
Generalizations
Very long latencies (on the order of
milliseconds)
Application coupling
Incorporate dynamic data sources into ongoing
computations

64
Roadmap

SCALLOP A highly scalable, infinite domain
Poisson solver written in KeLP
Asynchronous algorithms with Tarragon
Communication overlap
Monte Carlo simulation with cell microphysiology

65
MCELL

Monte Carlo simulator of cellular microphysiology
Biochemical reaction dynamics in realistic 3D
microenvironments
Brownian dynamics of individual molecules and
their chemical interactions
Developed at the Salk Institute and Pittsburgh
Supercomputing Center by Tom Bartol and Joel
Stiles
100 users (First released in 1997)
Mcell-K parallel variant implemented with KeLP
(with Tom Bartol and Terrence Sejnowski, Salk)

66
Cell microphysiology simulation

Collaboration with Tom Bartol, Tilman Kispersky,
Terrence Sejnowski (Salk Institute), Joel Sitles
(PSC)
MCell a general Monte Carlo simulator of
cellular microphysiology
Brownian dynamics random walk of individual
molecules and chemical interactions
100 users (First released in 1997)
Mcell-K parallel variant implemented with KeLP

67
Motivating application

Cerebellar Glomerulus
2 CPU-months on a single processor
24 GB of RAM
20 million Ca2 ions, 10 million polygons
With serial MCell
Run 1/8 of the domain of the problem on a single
processor
Reduced resolution
Scalable KeLP version MCell-K
Running on up to 128 processors on Blue Horizon
Collaboration involving Greg Balls (UCSD),
Srinivas Turaga,Tilman Kispersky (UCSD/Salk),
Tom Bartol (Salk),Terry Sejnowski (Salk)

68
Animation

Simulation of a chick ciliary ganglion synapse
A real-world problem
400,000 polygons in the surface
Approximately 40,000 molecules diffusing
Approximately 500,000 surface receptors

69
Chick ciliary ganglion synapse
Receptors
Chick ciliary ganglion synapse courtesyDarwin
Berg, Jay Coggan, Mark Ellisman,Eduardo
Esquenazi, Terry Sejnowski, Tom Bartol
70
Chick ciliary ganglion synapse
Ligands
71
Diffusion and Interactions

Ligands neurotransmitter molecules
Bind to sites under constraints
Bounce off of surfaces
Uneven distributions in space and time

release sites
72
On a parallel computer

Partition boundary splits up the problem over
multiple processors
As ligands cross a processor boundary, we color
them yellow

Bound ligands
73
Animation
74
Movie
75
Issues in Parallelization

Particles move over a sequence of timesteps
React with embedded 2D surfaces - cell membranes
Processor boundaries introduce uncertainties in
handling communication and the need to detect
termination
A particle may bounce among processors owning
nearby regions of space

76
Two questions

How do we know when the current timestep has
completed?
How and when do we transmit particles among
processors?

77
Parallelization Strategy

To detect termination we divide each timestep
into sub-timesteps
We continue to the next time step only when there
are no more ligands to udpate or communicate
Currently implemented with a barrier
Aggregate communication of ligands to amortize
message starts
Buffers and message lengths scaled automatically
Uniform static decomposition
Work on dynamic load balancing is underway

78
Software infrastructure Abstract KeLP

A rapid development infrastructure for
distributed memory machines
Implemented as a C class library layered on MPI
Communication orchestration
Manage communication in terms of geometric set
operations
KeLP doesnt need to know how the user
represented application data structures, user
doesnt need to know about low level details of
moving data
User-defined container classes
wrote a special purpose molecule class
callbacks to handle data packing and unpacking
Simple interface
clean separation of parallelism from other code
small change from original serial code

79
Computational results

Chick ciliary ganglion
400k surface triangles
192 release sites (max of 550)
Each site releases 5000 ligands at t0 (960k
total)
2500 time steps
Persistent ligand case
Enzymes that destroy ligands are made less
effective
Most ligands are present at the end of the
simulation
Report summary statistics in epochs of 100 time
steps
Ran on NPACI Blue horizon 16, 32, 64 processors

80
Performance on NPACI Blue Horizon

Running times scale well

81
Parallel Efficiency

Communication costs for this algorithm are low on
Blue Horizon
Communicating a few thousand molecules
All-reduce a few hundred µs for 64 procs
Each time step requires 1 s of computation

82
Performance Prediction

Running times are predicted well by maximum
ligands per processor

83
Load imbalance

Maximum load close to 2x average load.

84
Uneven workload distributions

Loads vary significantly, dynamically

85
Load Balancing

1 release site, 10,000 molecules
8 simulated processors

86
Load Balancing - Ganglion

18 release sites, 1000 molecules each
8 simulated processors

87
The Future

Ligands may bounce across processor boundaries
Detecting termination is expensive
Lost opportunities due to load imbalance
Motivates asynchronous, execution, novel
scheduling
New project Tarragon, NSF ITR

Wire frame view of rat diaphragm synapse courtesy
Tom Bartol and Joel Stiles
88
Non-BSP programming with Tarragon

Tarragon employs a task graph model of execution
Couples task completion with communication
Tolerates unpredictable or irregular task and
communication latencies
Different from traditional BSP programming
Arriving data triggers communication
Task completion triggers computation
Testbed for exploring communication tolerant
algorithms (linear algebra, data assimilation)

89
Asynchronous computation with Tarragon

ITR Asynchronous execution for scalable
simulation of cell physiology
Cleaner treatment of migrating particles
change owners dynamically
avoid subtimestepping which exacerbates load
imbalancing
Many-to-one task assignments
Automated load balancing via workload migration
Finer grained intermittent communication

90
Current and Future Work

Asynchronous computation
Large scale simulations
Load balancing
Predictive modeling (U. Rao Venkata)
Parameter sweep

91
Dynamic data driven applications

Using Tarragons data driven programming model,
we can couple external data sources into ongoing
computation
Work in progress, 2 applications
Dynamic clamping of neurons (Bartol Sejnowksi,
Salk)
Feedback MCell simulations of neural
microphysiology into living neurons in vitro via
patch clamping
Living and simulated neurons are (virtually) part
of the same circuit
Interactive ray tracing of dynamic scenes (H.
Jensen UCSD)
change the lighting, scene, camera angle..
Need interactive feel

92
Dynamic scenes

Original scene (courtesy Henrik Jensen)

93
Dynamic scenes

Adding an object to the scene

94
Dynamic scenes

Changing the camera angle

95
Conclusions and the Future

Weve seen two techniques for coping with
technological change
SCALLOP A new numerical algorithm
Tarragon A new execution model (partitioning and
scheduling)
Each technique requires appropriate
infrastructure to contend with the rising costs
of data motion
Generalizations
Very long latencies (on the order of
milliseconds)
Application coupling
Incorporate dynamic data sources into ongoing
computation
What are the roles of libraries and programming
languages?

96
Conclusions

Weve looked at an asynchronous data driven
programming model with motivating applications
Communication tolerance
Dynamic data driven applications that couple
simulations with the real world
Appropriate programming model simplifies the
design
Scheduling important, too.
What are the role of libraries and programming
languages?

97
Conclusions

Weve looked at an asynchronous data driven
programming model with motivating applications
Communication tolerance
Dynamic data driven applications that couple
simulations with the real world
Appropriate programming model simplifies the
design
Scheduling important, too.
What are the role of libraries and programming
languages?

98
Acknowledgements and support

Support
NSF ACI0326013, ACI9619020, IBN9985964
Howard Hughes Medical Institute
University of California, San Diego
San Diego Supercomputer Center (SDSC)
Cal-(IT)2
DoE (ISCR, CASC)
ESPRC (visits to Imperial College UK)
Papers and software http//www-cse.ucsd.edu/grou
ps/hpcl/scg/

99
Technology transitions

The KeLP technology is employed in the CHOMBO
structured adaptive mesh refinement (SAMR)
infrastructure(P. Colella, LBNL)
The technology also employed in the SAMRAI
infrastructure for SAMR(S. Kohn, R. Horning,
LLNL)

100
Applications

Mcell Cell microphysiology T. Sejnowski, T.
Bartol (SALK), J. R. Stiles (PSC)
First principle simulations of real materials
using structured adaptive mesh refinement J.
Weare et al.
Mortar space method for subsurface modeling M.
F. Wheeler, TICAM production code called
UTPROJ3D
Data management
Compression in Direct Numerical Simulation for
turbulenceK. K. Nomura, P. Diamessis, W.
Kerney
Querying structured adaptive mesh refinement
datasetsJ. Saltz, T. Kurc, OSU, P. Colella,
LBNL
KeLP I/O Target for telescoping Compiler B.
Broom, R. Fowler, K. Kennedy, Rice