Title: Technologies and Tools for High-Performance Distributed Computing
1Scientific Discovery through Advanced Computing
(SciDAC)
The Pennsylvania State University 28 April
2003 David E. Keyes Center for Computational
Science Old Dominion University Institute for
Scientific Computing Research Lawrence Livermore
National Laboratory
2Happy Gödels Birthday!
- Born 28 April 1906, Brünn, Austria-Hungary
- Published Incompleteness Theorem, 1931
- Fellow, Royal Society, 1968
- National Medal of Science, 1974
- Died 14 January 1978, Princeton, NJ
- Gave a formal demonstration of the inadequacy of
formal demonstrations- anon.
A consistency proof for any system can
be carried out only by modes of inference that
are not formalized in the system itself..
Kurt Gödel
3Remarks
- This talk is
- a personal perspective, not an official statement
of the U.S. Department of Energy - a project panorama more than a technical
presentation - For related technical presentations
- Tuesday 230pm, 116 McAllister Building
- personal homepage on the web (www.math.odu.edu/ke
yes) - SciDAC project homepage on the web
- (www.tops-scidac.org)
4Computational Science Engineering
- A multidiscipline on the verge of full bloom
- Envisioned by Von Neumann and others in the
1940s - Undergirded by theory (numerical analysis) for
the past fifty years - Empowered by spectacular advances in computer
architecture over the last twenty years - Enabled by powerful programming paradigms in the
last decade - Adopted in industrial and government applications
- Boeing 777s computational design a renowned
milestone - DOE NNSAs ASCI (motivated by CTBT)
- DOE SCs SciDAC (motivated by Kyoto, etc.)
5Niche for computational science
- Has theoretical aspects (modeling)
- Has experimental aspects (simulation)
- Unifies theory and experiment by providing common
immersive environment for interacting with
multiple data sets of different sources - Provides universal tools, both hardware and
software - Telescopes are for astronomers, microarray
analyzers are for biologists, spectrometers are
for chemists, and accelerators are for
physicists, but computers are for everyone! - Costs going down, capabilities going up every year
6Terascale simulation has been sold
Applied Physics radiation transport supernovae
Environment global climate contaminant transport
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
7Terascale simulation has been sold
Applied Physics radiation transport supernovae
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
8Terascale simulation has been sold
Applied Physics radiation transport supernovae
Experiments dangerous
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
9Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
10Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
11Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Experiments expensive
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
12Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Experiments expensive
Scientific Simulation
However, simulation is far from proven! To meet
expectations, we need to handle problems of
multiple physical scales.
13- Enabling technologies groups to develop
reusable software and partner with application
groups - Since start-up in 2001, 51 projects share 57M
per year - Approximately one-third for applications
- A third for integrated software infrastructure
centers - A third for grid infrastructure and
collaboratories - Plus, two new 10 Tflop/s IBM SP machines
available for SciDAC researchers
14SciDAC project characteristics
- Affirmation of importance of simulation
- for new scientific discovery, not just for
fitting experiments - Recognition that leading-edge simulation is
interdisciplinary - no independent support for physicists and
chemists to write their own software
infrastructure must collaborate with math CS
experts - Commitment to distributed hierarchical memory
computers - new code must target this architecture type
- Requirement of lab-university collaborations
- complementary strengths in simulation
- 13 laboratories and 50 universities in first
round of projects
15Major DOE labs
Old Dominion University
16Large platforms provided for ASCI
- ASCI roadmap is to go to 100 Teraflop/s by 2006
- Use variety of vendors
- Compaq
- Cray
- Intel
- IBM
- SGI
- Rely on commodity processor/memory units, with
tightly coupled network - Massive software project to rewrite physics codes
for distributed shared memory
17and now for SciDAC
- IBM Power3 SMP
- 16 procs per node
- 208 nodes
- 24 Gflop/s per node
- 5 Tflop/s (upgraded to 10, Feb 2003)
Berkeley
- IBM Power4 Regatta
- 32 procs per node
- 24 nodes
- 166 Gflop/s per node
- 4Tflop/s (10 in 2003)
Oak Ridge
18New architecture on horizon QCDOC
- System-on-a-chip architecture
- Designed for Columbia University and Brookhaven
National Lab by IBM using Power technology - Special purpose machine for Lattice Gauge Theory
Quantum Chromodynamics - very fast conjugate gradient machine with small
local memory - 10 Tflop/s total, copies ordered for UK, Japan
QCD research groups
To be delivered August 2003
19New architecture on horizon Blue Gene/L
- 180 Tflop/s configuration (65536 dual processor
chips) - Closely related to QCDOC prototype (IBM system-on
a chip) - Ordered for LLNL institutional computing (not
ASCI)
To be delivered 2004
20New architecture just arrived Cray X1
- Massively parallel-vector machine highly desired
by global climate simulation community - 32-processor prototype ordered for evaluation
- Scale-up to 100 Tflop/s peak planned, if
prototype proves successful
Delivered to ORNL 18 March 2003
21Boundary conditions from architecture
- Algorithms must run on physically distributed
memory units connected by message-passing
network, each serving one or more processors with
multiple levels of cache
22Following the platforms
- Algorithms must be
- highly concurrent and straightforward to load
balance - not communication bound
- cache friendly (temporal and spatial locality of
reference) - highly scalable (in the sense of convergence)
- Goal for algorithmic scalability fill up memory
of arbitrarily large machines while preserving
nearly constant running times with respect to
proportionally smaller problem on one processor
logarithmically growing
23Official SciDAC goals
- Create a new generation of scientific simulation
codes that take full advantage of the
extraordinary computing capabilities of terascale
computers. - Create the mathematical and systems software to
enable the scientific simulation codes to
effectively and efficiently use terascale
computers. - Create a collaboratory software environment to
enable geographically separated scientists to
effectively work together as a team and to
facilitate remote access to both facilities and
data.
24Four science programs involved
14 projects will advance the science of climate
simulation and prediction. These projects involve
novel methods and computationally efficient
approaches for simulating components of the
climate system and work on an integrated climate
model.
10 projects will address quantum chemistry and
fluid dynamics, for modeling energy-related
chemical transformations such as combustion,
catalysis, and photochemical energy conversion.
The goal of these projects is efficient
computational algorithms to predict complex
molecular structures and reaction rates with
unprecedented accuracy.
25Four science programs involved
4 projects in high energy and nuclear physics
will explore the fundamental processes of nature.
The projects include the search for the explosion
mechanism of core-collapse supernovae,
development of a new generation of accelerator
simulation codes, and simulations of quantum
chromodynamics.
5 projects are focused on developing and
improving the physics models needed for
integrated simulations of plasma systems to
advance fusion energy science. These projects
will focus on such fundamental phenomena as
electromagnetic wave-plasma interactions, plasma
turbulence, and macroscopic stability of
magnetically confined plasmas.
26SciDAC per year portfolio 57M
for Math, Information and Computer Sciences
27Data grids and collaboratories
- National data grids
- Particle physics grid
- Earth system grid
- Plasma physics for magnetic fusion
- DOE Science Grid
- Middleware
- Security and policy for group collaboration
- Middleware technology for science portals
- Network research
- Bandwidth estimation, measurement methodologies
and application - Optimizing performance of distributed
applications - Edge-based traffic processing
- Enabling technology for wide-area data intensive
applications
28Computer Science ISICs
- Scalable Systems Software
- Provide software tools for management and
utilization of terascale resources. - High-end Computer System Performance Science and
Engineering - Develop a science of performance prediction based
on concepts of program signatures, machine
signatures, detailed profiling, and performance
simulation and apply to complex DOE applications.
Develop tools that assist users to engineer
better performance. - Scientific Data Management
- Provide a framework for efficient management and
data mining of large, heterogeneous, distributed
data sets. - Component Technology for Terascale Software
- Develop software component technology for
high-performance parallel scientific codes,
promoting reuse and interoperability of complex
software, and assist application groups to
incorporate component technology into their
high-value codes.
29Applied Math ISICs
- Terascale Simulation Tools and Technologies
- Develop framework for use of multiple mesh and
discretization strategies within a single PDE
simulation. Focus on high-quality hybrid mesh
generation for representing complex and evolving
domains, high-order discretization techniques,
and adaptive strategies for automatically
optimizing a mesh to follow moving fronts or to
capture important solution features. - Algorithmic and Software Framework for Partial
Differential Equations - Develop framework for PDE simulation based on
locally structured grid methods, including
adaptive meshes for problems with multiple length
scales embedded boundary and overset grid
methods for complex geometries efficient and
accurate methods for particle and hybrid
particle/mesh simulations. - Terascale Optimal PDE Simulations
- Develop an integrated toolkit of near optimal
complexity solvers for nonlinear PDE simulations.
Focus on multilevel methods for nonlinear PDEs,
PDE-based eigenanalysis, and optimization of
PDE-constrained systems. Packages sharing same
distributed data structures include adaptive
time integrators for stiff systems, nonlinear
implicit solvers, optimization, linear solvers,
and eigenanalysis.
30Applied Math ISICs
- Terascale Simulation Tools and Technologies
- Develop framework for use of multiple mesh and
discretization strategies within a single PDE
simulation. Focus on high-quality hybrid mesh
generation for representing complex and evolving
domains, high-order discretization techniques,
and adaptive strategies for automatically
optimizing a mesh to follow moving fronts or to
capture important solution features. - Algorithmic and Software Framework for Partial
Differential Equations - Develop framework for PDE simulation based on
locally structured grid methods, including
adaptive meshes for problems with multiple length
scales embedded boundary and overset grid
methods for complex geometries efficient and
accurate methods for particle and hybrid
particle/mesh simulations. - Terascale Optimal PDE Simulations
- Develop an integrated toolkit of near optimal
complexity solvers for nonlinear PDE simulations.
Focus on multilevel methods for nonlinear PDEs,
PDE-based eigenanalysis, and optimization of
PDE-constrained systems. Packages sharing same
distributed data structures include adaptive
time integrators for stiff systems, nonlinear
implicit solvers, optimization, linear solvers,
and eigenanalysis.
31Applied Math ISICs
- Terascale Simulation Tools and Technologies
- Develop framework for use of multiple mesh and
discretization strategies within a single PDE
simulation. Focus on high-quality hybrid mesh
generation for representing complex and evolving
domains, high-order discretization techniques,
and adaptive strategies for automatically
optimizing a mesh to follow moving fronts or to
capture important solution features. - Algorithmic and Software Framework for Partial
Differential Equations - Develop framework for PDE simulation based on
locally structured grid methods, including
adaptive meshes for problems with multiple length
scales embedded boundary and overset grid
methods for complex geometries efficient and
accurate methods for particle and hybrid
particle/mesh simulations. - Terascale Optimal PDE Simulations
- Develop an integrated toolkit of near optimal
complexity solvers for nonlinear PDE simulations.
Focus on multilevel methods for nonlinear PDEs,
PDE-based eigenanalysis, and optimization of
PDE-constrained systems. Packages sharing same
distributed data structures include adaptive
time integrators for stiff systems, nonlinear
implicit solvers, optimization, linear solvers,
and eigenanalysis.
32Exciting time for enabling technologies
SciDAC application groups have been chartered to
build new and improved COMMUNITY CODES. Such
codes, such as NWCHEM, consume hundreds of
person-years of development, run at hundreds of
installations, are given large fractions of
community compute resources for decades, and
acquire an authority that can enable or limit
what is done and accepted as science in their
respective communities. Except at the beginning,
it is difficult to promote major algorithmic
ideas in such codes, since change is expensive
and sometimes resisted.
ISIC groups have a chance, due to the
interdependence built into the SciDAC program
structure, to simultaneously influence many of
these codes, by delivering software incorporating
optimal algorithms that may be reused across many
applications. Improvements driven by one
application will be available to all. While
they are building community codes, this is our
chance to build a CODE COMMUNITY!
33SciDAC themes
- Chance to do community codes right
- Meant to set new paradigm for other DOE
programs - new 2003 nano science modeling initiative
- possible new 2004 fusion simulation initiative
- Cultural barriers to interdisciplinary research
acknowledged up front - Accountabilities constructed in order to force
the mixing of scientific cultures
(physicists/biologists/chemists/engineers with
mathematicians/computer scientists)
34Opportunity nanoscience modeling
- Jul 2002 report to DOE
- Proposes 5M/year theory and modeling initiative
to accompany the existing 50M/year experimental
initiative in nano science - Report lays out research in numerical algorithms
and optimization methods on the critical path to
progress in nanotechnology
35Opportunity integrated fusion modeling
- Dec 2002 report to DOE
- Currently DOE supports 52 codes in Fusion Energy
Sciences - US contribution to ITER will major in
simulation - Initiative proposes to use advanced computer
science techniques and numerical algorithms to
improve the US code base in magnetic fusion
energy and allow codes to interoperate
36Whats new in SciDAC library software?
- Philosophy of library usage
- large codes interacting as peer applications,
with complex calling patterns (e.g., physics code
calls implicit solver code calls subroutine
automatically generated from original physics
code to supply Jacobian of physics code residual) - extensibility
- polyalgorithmic adaptivity
- Resources for development, long-term maintenance,
and support - not just for dissertation scope ideas
- Experience on terascale computers
37 Introducing Terascale Optimal PDE Simulations
(TOPS) ISIC
Nine institutions, 17M, five years, 24 co-PIs
3834 apps groups (BER, BES,FES, HENP)
7 ISIC groups (4 CS, 3 Math)
software integration
10 grid, data collaboratory groups
performance optimization
39Who we are
the PETSc and TAO people
the Hypre and Sundials people
the SuperLU and PARPACK people
as well as the builders of other widely used
packages
40Plus some university collaborators
Demmel et al.
Manteuffel et al.
Dongarra et al.
Ghattas et al.
Widlund et al.
Keyes et al.
Our DOE lab collaborations predate SciDAC by many
years.
41You may know the on-line Templates guides
www.netlib.org/etemplates
www.netlib.org/templates
124 pp.
410 pp.
these are good starts, but not adequate for
SciDAC scales!
42You may know the on-line Templates guides
www.netlib.org/etemplates
www.netlib.org/templates
124 pp.
410 pp.
43Scope for TOPS
- Design and implementation of solvers
- Time integrators
- Nonlinear solvers
- Optimizers
- Linear solvers
- Eigensolvers
- Software integration
- Performance optimization
(w/ sens. anal.)
(w/ sens. anal.)
44The power of optimal algorithms
- Advances in algorithmic efficiency rival advances
in hardware architecture - Consider Poissons equation on a cube of size
Nn3 - If n64, this implies an overall reduction in
flops of 16 million
Year Method Reference Storage Flops
1947 GE (banded) Von Neumann Goldstine n5 n7
1950 Optimal SOR Young n3 n4 log n
1971 CG Reid n3 n3.5 log n
1984 Full MG Brandt n3 n3
45Algorithms and Moores Law
- This advance took place over a span of about 36
years, or 24 doubling times for Moores Law - 224?16 million ? the same as the factor from
algorithms alone!
46The power of optimal algorithms
- Since O(N) is already optimal, there is nowhere
further upward to go in efficiency, but one
must extend optimality outward, to more general
problems - Hence, for instance, algebraic multigrid (AMG),
obtaining O(N) in indefinite, anisotropic,
inhomogeneous problems
algebraically smooth error
47Gordon Bell Prize performance
48Gordon Bell Prize outpaces Moores Law
Gordon Bell
CONCUR-RENCY!!!
49SciDAC application Center for Extended
Magnetohydrodynamic Modeling
Simulate plasmas in tokomaks, leading to
understanding of plasma instability and
(ultimately) new energy sources
Joint work between ODU, Argonne, LLNL, and PPPL
50Optimal solvers
- Convergence rate nearly independent of
discretization parameters - Multilevel schemes for linear and nonlinear
problems - Newton-like schemes for quadratic convergence of
nonlinear problems
AMG shows perfect iteration scaling, above, in
contrast to ASM, but still needs performance work
to achieve temporal scaling, below, on CEMM
fusion code, M3D, though time is halved (or
better) for large runs (all runs 4K dofs per
processor)
51Solver interoperability accomplishments
- Hypre in PETSc
- codes with PETSc interface (like CEMMs M3D) can
invoke Hypre routines as solvers or
preconditioners with command-line switch - SuperLU_DIST in PETSc
- as above, with SuperLU_DIST
- Hypre in AMR Chombo code
- so far, Hypre is level-solver only its AMG will
ultimately be useful as a bottom-solver, since it
can be coarsened indefinitely without attention
to loss of nested geometric structure also FAC
is being developed for AMR uses, like Chombo
52Background of PETSc Library
- Developed by at Argonne to support research,
prototyping, and production parallel solutions of
operator equations in message-passing
environments now joined by four additional staff
under SciDAC - Distributed data structures as fundamental
objects - index sets, vectors/gridfunctions, and
matrices/arrays - Iterative linear and nonlinear solvers,
combinable modularly and recursively, and
extensibly - Portable, and callable from C, C, Fortran
- Uniform high-level API, with multi-layered entry
- Aggressively optimized copies minimized,
communication aggregated and overlapped, caches
and registers reused, memory chunks preallocated,
inspector-executor model for repetitive tasks
(e.g., gather/scatter)
See http//www.mcs.anl.gov/petsc
53User Code/PETSc Library Interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application Initialization
Function Evaluation
Jacobian Evaluation
Post- Processing
User code
PETSc code
54User Code/PETSc Library Interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application Initialization
Function Evaluation
Jacobian Evaluation
Post- Processing
User code
PETSc code
To be AD code
55Background of Hypre Library(to be combined with
PETSc under SciDAC)
- Developed by Livermore to support research,
prototyping, and production parallel solutions of
operator equations in message-passing
environments now joined by seven additional
staff under ASCI and SciDAC - Object-oriented design similar to PETSc
- Concentrates on linear problems only
- Richer in preconditioners than PETSc, with focus
on algebraic multigrid - Includes other preconditioners, including sparse
approximate inverse (Parasails) and parallel ILU
(Euclid)
See http//www.llnl.gov/CASC/hypre/
56Hypres Conceptual Interfaces
Slide c/o R. Falgout, LLNL
57Eigensolvers for Accelerator Design
- Stanfords Omega3P is using TOPS software to find
EM modes of accelerator cavities - Methods Exact Shift-and-Invert Lanczos (ESIL),
combining PARPACK with SuperLU when there is
sufficient memory, and Jacobi-Davidson otherwise - Current high-water marks
- 47-cell chamber, finite element discr. of
Maxwells eqs. - System dimension 1.3 million
- 20 million nonzeros in system, 350 million in LU
factors - halved analysis time on 48 processors, scalable
to many hundreds
58Optimizers
- Unconstrained or bound-constrained optimization
- TAO (powered by PETSc, interfaced in CCTTSS
component framework) used in quantum chemistry
energy minimization - PDE-constrained optimization
- Veltisto (powered by PETSC) used in flow control
application, to straighten out wingtip vortex by
wing surface blowing and sunction - Best technical paper at SC2002 went to TOPS
team - PETSc-powered inverse wave propagation employed
to infer hidden geometry
4000 controls 128 procs
2 million controls 256 procs
59Performance
- TOPS is tuning sparse kernels
- (Jacobian) matrix-vector multiplication
- sparse factorization
- multigrid relaxation
- Running on dozens of apps/platform combinations
- Power3 (NERSC) and Power4 (ORNL)
- factors of 2 on structured (CMRS) and
unstructured (CEMM) fusion apps - Best student paper at ICS2002 went to TOPS team
- theoretical model and experiments on effects of
register blocking for sparse mat-vec
60Lessons to date
- Working with the same code on the same machine
vastly speeds collaboration, as opposed to
ftping matrices around the country, etc. - Exchanging code templates better than exchanging
papers, etc. - Version control systems essential to having any
last impact or insertion path for solver
improvements - Doing physics more fun than doing driven
cavities
61Abstract Gantt Chart for TOPS
Each color module represents an algorithmic
research idea on its way to becoming part of a
supported community software tool. At any moment
(vertical time slice), TOPS has work underway at
multiple levels. While some codes are in
applications already, they are being improved in
functionality and performance as part of the TOPS
research agenda.
Dissemination
Applications Integration
Hardened Codes
e.g.,PETSc
Research Implementations
e.g.,TOPSLib
Algorithmic Development
e.g., ASPIN
time
62Goals/Success Metrics
TOPS users
- Understand range of algorithmic options and their
tradeoffs (e.g., memory versus time) - Can try all reasonable options easily without
recoding or extensive recompilation - Know how their solvers are performing
- Spend more time in their physics than in their
solvers - Are intelligently driving solver research, and
publishing joint papers with TOPS researchers - Can simulate truly new physics, as solver limits
are steadily pushed back
63Expectations TOPS has of Users
- Be willing to experiment with novel algorithmic
choices optimality is rarely achieved beyond
model problems without interplay between physics
and algorithmics! - Adopt flexible, extensible programming styles in
which algorithmic and data structures are not
hardwired - Be willing to let us play with the real code you
care about, but be willing, as well to abstract
out relevant compact tests - Be willing to make concrete requests, to
understand that requests must be prioritized, and
to work with us in addressing the high priority
requests - If possible, profile, profile, profile before
seeking help
64For more information ...
dkeyes_at_odu.edu
http//www.tops-scidac.org
65Related URLs
- Personal homepage papers, talks, etc.
- http//www.math.odu.edu/keyes
- SciDAC initiative
- http//www.science.doe.gov/scidac
- TOPS project
- http//www.math.odu.edu/keyes/scidac
- PETSc project
- http//www.mcs.anl.gov/petsc
- Hypre project
- http//www.llnl.gov/CASC/hypre
- ASCI platforms
- http//www.llnl.gov/asci/platforms
- ISCR annual report, etc.
- http//www.llnl.gov/casc/iscr