Title: Scalable Scientific Applications Characteristics
1Scalable Scientific ApplicationsCharacteristics
Future Directions
- Douglas B. Kothe
- And Richard Barrett, Ricky Kendall, Bronson
Messer, Trey White - Leadership Computing Facility
- National Center for Computational Sciences
- Oak Ridge National Laboratory
2Science Teams Have Specific PF Objectives
Application Area Science Driver Science Objective Impact
Combustion (S3D) Predictive engineering engine design simulation tool for new engine design Understanding flame stabilization in lifted autoigniting diesel fuel jets relevant to low-temperature combustion for engine design at realistic operating conditions Potential for 50 increase in efficiency and 20 savings in petroleum consumption with lower emission, leaner burning engines
Fusion (GTC) Understand and quantify physics and properties of ITER scaling and H-mode confinement Strongly coupled and consistent wall-to-edge-to-core modeling of ITER plasmas attain a realistic assessment of ignition margins ITER design and operation
Chemistry (MADNESS) Computational catalysis Describe large systems accurately with modern hybrid and meta density functional theory functionals Generate quantitative catalytic reaction rates and guide small system calibration
Nanoscale Science (DCA) Material-specific understanding of high-temperature superconductivity theory Understand the quantitative differences in the transition temperatures of high-temperature superconductors Macroscopic quantum effect at elevated temperatures (gt150K) new materials for power transmission and oxide electronics
Climate (POP) Accurate representation of ocean circulation Fully coupled eddy-resolving ocean and sea ice model to reduce the coupled model biases where ice and deep water parameters are governed by the accurate representation of current systems Reduce current uncertainties in coupled ocean-sea ice system model
Geoscience (PFLOTRAN) Perform multiscale, multiphase, multi-component modeling of a 3-D field CO2 injection scenario Include oil phase and four-phase liquid-gas-aqueous-oil system to describe dissipation of the supercritical CO2 phase and escape of CO2 to the surface Demonstrate viability of and potential for sequestration of anthropogenic CO2 in deep geologic formations
Astrophysics (CHIMERA) Understand the core-collapse supernova mechanism for a range of progenitor star masses Perform core-collapse simulations with sophisticated spectral neutrino transport, detailed nuclear burning, and general relativistic gravity Understand the origin of many elements in the Periodic Table and the creation of neutron stars and black holes
3Application Requirements at the PF
- Application categories analyzed
- Science motivation and impact
- Science quality and productivity
- Application models, algorithms, software
- Application footprint on platform
- Data management and analysis
- Early access science-at-scale scenarios
- Results
- 100 page Application Requirements Document
published in Jul 07 - New methods for categorizing platforms and
application attributes devised and utilized in
analysis guiding tactical infrastructure
purchase and deployment - But still too qualitative! More work to do.
4Application Codes in 2008An Incomplete List
- Astrophysics
- CHIMERA, GenASiS, 3DHFEOS, Hahndol, SNe, MPA-FT,
SEDONA, MAESTRO, AstroGK - Biology
- NAMD, LAMMPS
- Chemistry
- CPMD, CP2K, MADNESS, NWChem, Parsec, Quantum
Expresso, RMG, GAMESS - Nuclear Physics
- ANGFMC, MFDn, NUCCOR, HFODD
- Engineering
- Fasel, S3D, Raptor, MFIX, Truchas, BCFD, CFL3D,
OVERFLOW, MDOPT - High Energy Physics
- CPS, Chroma, MILC
- Fusion
- AORSA, GYRO, GTC, XGC
- Materials Science
- VASP, LS3DF, DCA, QMCPACK, RMG, WL-LSMS,
WL-AMBER, QMC - Accelerator Physics
- Omega3P, T3P
- Atomic Physics
- TDCC, RMPS, TDL
- Space Physics
- Pogorelov
- Climate Geosciences
- MITgcm, PFLOTRAN, POP, CCSM (CAM, CICE, CLM,
POP) - Computer Science (Tools)
- Active Harmony, IPM, KOJAK, mpiP, PAPI, PMaC,
Sca/LAPACK, SvPablo, TAU
5Apps Teams Are Reasonably Adept at Using our
Current Systems
Is the field of dreams approach inadequate
(too little too late)? What is effective
utilization? Scaling? Percent of Peak (Jacoby vs
MG)? Current SC apps range from 2-70 of peak
whats the goal? Remember, we improve what we
measure so lets have the right metrics and
measures My 0.02 science and engineering
achievements on these systems is the legacy
6Science WorkloadJob Sizes and Resource Usage of
Key Applications
Code 2007 Resource Utilization (M core-hours) Projected 2008 Resource Utilization (M core-hours) Typical Job Size in 2006-2007 (K cores) Anticipated Job Size in 2008 (K cores)
CHIMERA 2 (under development) 16 0.25 (under development) gt10
GTC 8 7 8 12
S3D 6.5 18 8-12 gt15
POP 4.8 4.7 4 8
MADNESS 1 (under development) 4 0.25 (under development) gt8
DCA N/A (under development) 3-8 N/A (under development) 4-16 (w/o disorder) gt40 (with disorder)
PFLOTRAN 0.37 (under development) gt2 1-2 (under development) gt10
AORSA 0.61 1 15-20 gt20
7Preparing for the ExascaleLong-Term Science
Drivers and Requirements
- We have recently surveyed, analyzed, and
documented the science drivers and application
requirements envisioned for exascale leadership
systems in the 2020 timeframe - These studies help to
- Provide a roadmap for the ORNL Leadership
Computing Facility - Uncover application needs and requirements
- Focus our efforts on those disruptive
technologies and research areas in need of our
and the HPC communitys attention
8What Will an EF System Look Like?
- All projections are daunting
- Based on projections of existing technology both
with and without disruptive technologies - Assumed to arrive in 2016-2020 timeframe
- Example 1
- 115K nodes _at_ 10 TF per node, 50-100 PB, optical
interconnect, 150-200 GB/s injection B/W per
node, 50 MW - Examples 2-4 (DOE Townhall report)
www.er.doe.gov/ASCR/ProgramDocuments/TownHall.pdf
9Science Prospects and Benefits with High End
Computing (EF?) in the Next Decade
Opportunity Key application areas Goal and benefit
Materials science Nanoscale science, manufacturing, and material lifecycles, response and failure Design, characterize, and manufacture materials, down to the nanoscale, tailored and optimized for specific applications
Earth science Weather, carbon management, climate change mitigation and adaptation, environment Understand the complex biogeochemical cycles that underpin global ecosystems and control the sustainability of life on Earth
Energy assurance Fossil, fusion, combustion, nuclear fuel cycle, chemical catalysis, renewables (wind, solar, hydro), bioenergy, energy efficiency, power grid, transportation, buildings Attain, without costly disruption, the energy required by the United States in guaranteed and economically viable ways to satisfy residential, commercial, and transportation requirements
Fundamental science High energy physics, nuclear physics, astrophysics, accelerator physics Decipher and comprehend the core laws governing the Universe and unravel its origins
Biology and medicine Proteomics, drug design, systems biology Understand connections from individual proteins through whole cells into ecosystems and environments
National security Disaster management, homeland security, defense systems, public policy Analyze, design, stress-test, and optimize critical systems such as communications, homeland security, and defense systems understand and uncover human behavioral systems underlying asymmetric operation environments
Engineering design Industrial and manufacturing processes Design, deploy, and operate safe and economical structures, machines, processes, and systems with reduced concept-to-deployment time
10Science Case Climate
Mitigation Evaluate strategies and inform policy
decisions for climate stabilization 100-1000
year simulations Adaptation Decadal forecasts
region impacts prepare for committed climate
change 10-100 year simulations
- 250 TF
- Mitigation Initial simulations with dynamic
carbon cycle and limited chemistry - Adaptation Decadal simulations with
high-resolution ocean (1/10) - 1 PF
- Mitigation Full chemistry, carbon/nitrogen/sulfur
cycles, ice-sheet model, multiple ensembles - Adaptation High-resolution atmosphere (1/4),
land, and sea ice, as well as ocean
- Sustained PF
- Mitigation Increased resolution, longer
simulations, more ensembles for reliable
projections coupling with socio-economic and
biodiversity models - Adaptation Limited cloud-resolving simulations,
large-scale data assimilation - 1 EF
- Mitigation Multi-century ensemble projections
for detailed comparisons of mitigation strategies
- Adaptation Full cloud-resolving simulations,
decadal forecasts of regional impacts and
extreme-event statistics
Resolve clouds, forecast weather extreme
events, provide quantitative mitigation strategies
11Barriers in Ultrascale Climate SimulationAttackin
g the Fourth Dimension Parallel in Time
- Problem
- Climate models use explicit time stepping
- Time step must go down as resolution goes up
- Time stepping is serial
- Single-process performance is stagnating
- More parallel processes do not help!
- Possible complementary solutions
- Implicit time stepping
- High-order in time
- Fast bases curvelets and multi-wavelets
- Parareal parallel in time
- Progress
- Implicit version of HOMME for global
shallow-water equations 10x speedup for
steady-state test case - High-order single-step time integration
- Single-cycle multi-grid linear solver for 1D
- Pure advection with curvelets and multi-wavelets
- Near-term plans
- Scale, tune, and precondition implicit HOMME
- Single-cycle multi-grid linear solver for 2D
- Parareal for Burgers (1D nonlinear)
Ref Trey White (ORNL)
12Science Case Astrophysics
- 250 TF
- The interplay of several important phenomena
hydrodynamic instabilities, role of nuclear
burning, neutrino transport - 1 PF
- Determine the nature of the core-collapse
supernova explosion mechanism - Fully integrated, 3D neutrino radiation
hydrodynamics simulations with nuclear burning
- Sustained PF
- Detailed nucleosynthesis (element production)
from core-collapse SNe - Large nuclear network capable of isotopic
prediction (along with energy production) - 1 EF
- Precision prediction of complete observable set
from core-collapse SNe nucleosynthesis,
gravitational waves, neutrino signatures, light
output - Tests general relativity and information about
the dense matter equation state, along with
detailed knowledge of stellar evolution - Full 3D Boltzmann neutrino tranpsort, 3D MHD/RHD,
nuclear burning
Explanation and prediction of core-collapse SNe
put general relativity, dense EOS, stellar
evolution theories to the test
13Requirements Gathering
- Consult literature and existing documentation
- Construct a survey eliciting speculative
requirements for scientific application on HPC
platforms in 20102020 - Pass the survey to leading computational
scientists in a broad range of scientific domains - Analyze and validate the survey results (hard)
- Make informed decisions and take action
14Survey Questions
- What are some possible science drivers and urgent
problems that would require Leadership Computing
in 20102020? - What are some looming computational challenges
that will need resolution in 20102020? - What are some science objectives and outcomes
that Leadership Computing could enable in
20102020? - What are some improvement goals for
science-simulation fidelity that Leadership
Computing could enable in 20102020? - What are some possible changes in physical model
attributes for Leadership-Computing applications
in 20102020?
- What major software-development projects could
occur in your application area in 20102020? - What major algorithm changes could occur for your
applications in 20102020? - What libraries and development tools may need to
be developed or significantly improved for
Leadership Computing in 20102020? - How might system-attribute priorities change for
Leadership Computing for your application? - In what ways might or should your workflow in
20102020 be different from today? - Are there any disruptive technologies that might
affect your applications?
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Findings in Models and Algorithms
- The seven algorithm types are scattered broadly
among science domains, with no one particular
algorithm being ubiquitous and no one algorithm
going unused. - Structured grids and dense linear algebra
continue to dominate, but other algorithm
categories will become more common. - Compared to the Seven Dwarfs for current
applications, we project a significant increase
in Monte Carlo and increases in unstructured
grids, sparse linear algebra, and particle
methods, as well as a relative decrease in FFTs - These projections reflect the expectation of
much-greater parallelism in architectures and the
resulting need for very high scalability - Load balancing, scalable sparse solver, and
random number generator algorithms will be more
important. - Some important algorithms are not captured in the
Seven Dwarfs - Categories expected by application scientists to
be of growing importance in 20102020 include
adaptive mesh refinement, implicit nonlinear
systems, data assimilation, agent-based methods,
parameter continuation, and optimization
21Findings in Software
- Hero developer mode is fatalistic
- Does not scale and no single person can
adequately understand breadth and depth of issues - Only accomplished by computer scientists,
algorithm developers, application developers, and
end-user scientists working together in a tightly
integrated manner - Must develop a means of interface between the
heterogeneous computer, the developer, and the
end user scientist - Must raise the level of abstraction
- Current approach based on low-level constructs
places constraints on performance over-constrain
compiler and runtime system - Raising abstraction level allows for increased
algorithm experimentation, incorporation of
intent in data structures, flexible memory
organization, inclusion of fault tolerance
constructs - Enables exploration of power-aware algorithms
- Freedom from heroic software efforts having to be
the norm
22Findings in Software
- Application development and maintenance tools and
practices need to fundamentally change - Productivity improvement is an important metric
and guide for tool and software choices - Fault tolerance and VV software components must
be used to improve reliability and robustness of
application software - Knowledge discovery techniques and tools should
be explored to help with bug detection,
simulation steering, and data feature extraction
and correlation - A holistic view of application data (from input
to archival) is needed to most effectively
deliver tools for the end-to-end workflow
performed by scientists
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28Applications Analyzed
- CHIMERA
- Astrophysics core-collapse supernova explosion
mechanism - S3D
- Turbulent combustion lifted flame stabilization
in diesel gas turbine engines - GTC
- Fusion Analyze and validate CTEM and ETG core
turbulence - POP
- Global ocean circulation Eddy-resolved flow with
biogeochemistry - DCA
- High-temperature superconductivity Effect of
charge spin inhomogeneities in the Hubbard
model superconducting state - MADNESS
- Chemistry neutron x-ray spectra of cuprates
dynamics of few-electron systems metal oxides
surfaces in catalytic processes - PFLOTRAN
- Reactive flows in porous media Uranium migration
and CO2 sequestration in subsurface geologic
formations
29Application Requirements and Workload Reinforces
a Balanced System Assertion
100
100
Distribution in this space depends upon the
applications and the problem being simulated for
a given application
- Applications analyzed represent almost one half
of our 2008 allocation - A broad range of compute/communicate workloads
must be supported - Depends upon science, application within that
science, and problem tackled by application - Application requirements call for breadth in
models, algorithms, software, and scaling type - Physical models
- coupled continuum conservation laws, radiation
transport, many-body Schrodinger, plasma physics,
Maxwells equations, turbulence - Numerical algorithms
- Each of the 7 dwarfs is required
- Software implementation
- All popular languages are required
- Science drivers
- Strong scaling (time to solution)
- Weak scaling (bigger problem)
- Application readiness actions plans are in place
and being followed
CHIMERA
CHIMERA
Communication
Communication
POP
POP
GTC
GTC
MADNESS
MADNESS
S3D
S3D
PFLOTRAN
DCA
DCA
0
0
Computation
0
100
Computation
0
100
CHIMERA
S3D
MADNESS
PFLOTRAN
POP
GTC
DCA
30Resource Utilization by Science
ApplicationsScience Dictates the Requirements
31Example PF Performance Observations and
Readiness Plan for Some of our Key Apps
Code Science Scaling Needs Performance Observations and Readiness Plan
S3D larger problem Compute-bound with minimal communication overhead Reduce memory contention with hybrid parallelism Increase cache reuse
GTC larger problem Compute-bound with minimal communication overhead Use radial domain decomposition to eliminate cross-core collective calls Reduce the size of the problem per core and get better cache-reuse Increase SSE factor
DCA solution time Heavily compute bound, benefitting from L3 BLAS routines (DGEMM,ZGEMM) Very good use of SSE (50) with no changes Include disorder model for additional level of parallelism 10x need for more processors Multithreaded linear algebra will allow additional parallelism at a lower level
MADNESS solution time Full asynchronous algorithm with communications hidden by model Nicely positioned to exploit Gemini Good SSE factor but still room for improvement
POP solution time Sizeable communication component Reduce memory contention time and increase SSE factor Minimize synchronous behavior better cache blocking New physics (biogeochemistry) increases compute fraction
CHIMERA larger problem Communication dominated by collectives Production level physics increases compute fraction Reasonable SSE factor but room for improvement 20 raw speedup from Gemini w/o enhancements
PFLOTRAN solution time Communication dominated by collectives Poor SSE factor some room for improvement Additional phases and chemical species will reduce memory contention (natural block structure of Jacobian enables more efficient use of memory hierarchy)
32Accelerating Development Readiness
- Automated diagnostics
- Drivers performance analysis, application
verification, S/W debugging, H/W-fault detection
and correction, failure prediction and avoidance,
system tuning, and requirements analysis - Hardware latency
- Wont see improvement nearly as much as flop
rate, parallelism, B/W in coming years - Can S/W strategies mitigate high H/W latencies?
- Hierarchical algorithms
- Applications will require algorithms aware of the
system hierarchy (compute/memory) - In addtion to hybrid data parallelism, and
file-based checkpointing, algorithms may need to
include dynamic decisions between recomputing and
storing, fine-scale task-data hybrid parallelism,
and in-memory checkpointing - Parallel programming models
- Improved programming models needed to allow
developer to identify an arbitrary number of
levels of parallelism and map them onto hardware
hierarchies at runtime - Models continue to be coupled into larger models,
driving the need for arbitrary hierarchies of
task and data parallelism
33Accelerating Development Readiness
- Solver technology and innovative solution
techniques - Global communication operations across 106-8
processors will be prohibitively expensive,
solvers will have to eliminate global
communication where feasible and mitigate its
effects where it cannot be avoided. Research on
more effective local preconditioners will become
a very high priority - If increases in memory B/W continue to lag the
number of cores added to each socket, further
research needed into ways to effectively trade
flops for memory loads/stores - Accelerated time integration
- Are we ignoring the time dimension along which to
exploit parallelism? (Ex climate) - Model coupling
- Coupled models require effective methods to
implement, verify, and validate the couplings,
which can occur across wide spatial and temporal
scales. The coupling requirements drive the need
for robust methods for downscaling, upscaling,
and coupled nonlinear solving - Evaluation of the accuracy and importance of
couplings drives the need for methods for
validation, uncertainty analysis, and sensitivity
analysis of these complex models - Maintaining current libraries
- Reliance of current HPC applications on libraries
will grow - Libraries must perform as HPC systems grow in
parallelism and complexity
34PF Survey Findings (with some opinion)
- A rigorous evolving apps reqms process pays
dividends - Needs to be quantitative apps cannot lie with
performance analysis - Algorithm development is evolutionary
- Can we break this mold?
- Ex Explore new parallel dimensions (time,
energy) - Hybrid/multi-level programming models virtually
nonexistent - No algorithm sweet spots (one size fits all)
- But algorithm footprints share characteristics
- VV and SQA not in good standing
- Ramifications on compute systems as well as apps
results generated
- No one is really clamoring for new languages
- MPI until the water gets too hot (frog analogy)
- Apps lifetimes are gt3-5x machine lifetimes
- Refactoring a way of life
- Fault tolerance via defensive checkpointing de
facto standard - Wont this eventually bite us? Artificially
drives I/O demands - Weak or strong scale or both (no winner)
- Data analytics paradigm must change
- The middleware layer is surprisingly stable and
agnostic across apps (and should expand!)
35Summary Recommendations EF Survey
- We are in danger of failing because of a software
crisis unless concerted investments are
undertaken to close the H/W-S/W gap - H/W has gotten way ahead of the S/W (same ole
same ole?) - Structured grids and dense linear algebra
continue to dominate, but - Increase projected for Monte Carlo algorithms,
unstructured grids, sparse linear algebra, and
particle methods (relative decrease in FFTs) - Increasing importance for AMR, implicit nonlinear
systems, data assimilation, agent-based method,
parameter continuation, optimization - Priority of computing system attributes
- Increase interconnect bandwidth, memory
bandwidth, mean time to interrupt, memory
latency, and interconnect latency - Reflect desire to increase computational
efficiency to use peak flops - Decrease disk latency, archival storage
capacity, disk bandwidth, wide area network
bandwidth, and local storage capacity - Reflect expectation that computational efficiency
will not increase - Per-core requirements relatively static, while
aggregate requirements will grow with the system
36Summary Recommendations EF Survey
- System software must possess more stability,
reliability, and fault tolerance during
application execution - New fault tolerance paradigms must be developed
and integrated into applications - Job management and efficient scheduling of those
resources will be a major obstacle faced by
computing centers - Systems must be much better science producers
- Strong software engineering practices must be
applied to systems to ensure good end-to-end
productivity - Data analytics must empower scientists to ask
what-if questions, providing S/W H/W
infrastructure capable of answering these
questions in a timely fashion (Google desktop) - Strong data management will become an absolute at
the exascale - Just like H/W requires disruptive technologies
for acceleration of natural evolutionary paths,
so too will algorithm, software, and physical
model development efforts need disruptive
technologies (invest now!)
37Fusion Simulation Project Where to find 12
orders in 10 years?
- 1.5 orders increased processor speed and
efficiency - 1.5 orders increased concurrency
- 1 order higher-order discretizations
- Same accuracy can be achieved with many fewer
elements - 1 order flux-surface following gridding
- Less resolution required along than across field
lines - 4 orders adaptive gridding
- Zones requiring refinement are lt1 of ITER volume
and resolution requirements away from them are
102 less severe - 3 orders implicit solvers
- Mode growth time 9 orders longer than
Alfven-limited CFL
38A View from Berkeley (John Shalf)
- Need better benchmarks and better performance
models - For reliable extrapolated code requirements
- Power is driving daunting concurrency
- Scalable programming models
- Need to exploit hierarchical machine architecture
- Hybrid processors
- More concurrency need a more generalized approach
- Apps must deal with platform reliability
- Dont forget autotuning
- Shows value of good compilers and associated RD
- Fast, robust I/O is hard
- Scaling and concurrency is outsripping our
ability to do rigorous VV - Application code complexity has outgrown
available tools - Frameworks and community codes can work but with
certain rules of engagement
ASCAC Fusion Simulation Project Review panel
presentation (4/30/08)
39Questions?
Doug Kothe (kothe_at_ornl.gov)