Title: Technologies for Computational Science
1Technologies for Computational Science
- Boyana Norris
- Argonne National Laboratory
- http//www.mcs.anl.gov/norris
2Outline
- Automatic differentiation
- Applications in optimization
- How AD works
- Components for scientific computing
- Performance evaluation and modeling
- Bringing it all together
3What is automatic differentiation?
- Automatic Differentiation (AD) a technology for
automatically augmenting computer programs,
including arbitrarily complex simulations, with
statements for the computation of derivatives,
also known as sensitivities.
The Computational Differentiation Project at
Argonne National Laboratory
4What is it good for?
- The need to accurately and efficiently compute
derivatives of complicated simulation codes
arises regularly in - Optimization (finding a minimum)
- Solving nonlinear differential equations
- Sensitivity and uncertainty analysis
- Inverse Problems, including
- Data assimilation
- Parameter identification
- AD tools automate the generation of derivative
code without precluding the exploitation of
high-level knowledge.
5Sensitivity Analysis
MM5 (a mesoscale weather model, NCAR and Penn
State)
Impact of perturbations of initial temperature on
temperature in the system low-amplitude
supersonic waves clearly visible with AD (left),
but not visible with divided difference
approximations of derivatives (right).
6Parameter Tuning
Sea Ice Model (Todd Arbetter, University of
Colorado)
Ice thickness for the standard (left) and tuned
(right) parameter values, with actual
observations at two locations indicated.
7Optimization Problems
- Often we look for extreme, or optimum, values
that a function has on a given domain. More
formally - Unconstrained minimization problems are ones in
which - Note Since a maximum of f is a minimum of -f, we
need only to look for the minimum.
8Newtons Method
- Method for finding x such that f(x) 0
- For optimization, we want ?f(x) 0, so
iterate
9Example Minimum Surface
Objective Find a surface with the minimal area
that satisfies Dirichlet boundary conditions and
is constrained to lie above a solid plate.
Solution
Error
10Example Minimum Surface (Cont.)
11We can compute derivatives via
- Analytic code
- By hand
- Automatic differentiation
- Numerical approximation finite differencing
(FD). For finite differences, recall
12Why use AD?
- Compared with other methods (numerical
differentiation via finite differences, hand
coding, etc.), AD offers a number of advantages - Accuracy
- Performance
- Reduced effort
- Algorithm-awareness
13More accurate derivatives faster convergence
Application modeling transonic flow over an
ONERA M6 airplane wing.
14Who uses it?
- AD has been successfully employed in applications
in - Atmospheric chemistry
- Breast cancer modeling
- Computational fluid dynamics
- Mesoscale climate modeling
- Network Enabled Optimization System
- Semiconductor device modeling
- And also groundwater remediation,
multidisciplinary design optimization, reactor
engineering, super-conductor simulation,
multibody simulations, molecular dynamics
simulations, power system analysis, water
reservoir simulation, and storm modeling.
15How AD Works
- Every programming language provides a limited
number of elementary mathematical functions,
e.g., , -, , /, sin, cos, - Thus, every function computed by a program may be
viewed as the composition of these so-called
intrinsic functions - Derivatives for the intrinsic functions are known
and can be combined using the chain rule of
differential calculus
16A Simple Example (Fortran)
Original program
x 3.14159265/4.0 a sin(x) b cos(x) t a/b
Differentiated program x 3.14159265/4.0 dxdx
1.0 ! Initialize seed matrix a sin(x) dadx
cos(x)dxdx ! TL/CR b cos(x) dbdx
-sin(x)dxdx ! TL/CR t a/b dtda 1.0/b !
TL dtdb -a/(bb) ! TL dtdx dtdadadx
dtdbdbdx ! CR
17Modes of AD
- Forward mode
- Mode used in simple example
- Propagates derivative vectors, often denoted ?u
or g_u - Derivative vector ?u contains derivatives of u
with respect to independent variables - Time and storage proportional to vector length (
indeps) - Reverse (or adjoint) mode
- Propagates adjoints, denoted u or u_bar
- Adjoint u contains derivatives of dependent
variables with respect to u - Propagation starts with dependent variablesmust
reverse flow of computation - Time proportional to adjoint vector length (
dependents) - Storage proportional to number of operations
- Because of this limitation, often applied to
subprograms
18Another Simple Example (C code)
Original code y x1x2x3x4
DERIV_val(y) value of program
variable y DERIV_grad(y)derivative object
associated with y
19The AD Process
Application Code
AD Tool
Code with Derivatives
Control Files
AD Support Libraries
Compile Link
Users Derivative Driver
Derivative Program
20Ways of Implementing AD
- Operator Overloading
- Use language features to generate trace (tape)
of computation -gt implicit computational graph - Easy to implement hard to optimize
- Examples ADOL-C
- Source Transformation (ST)
- Relies on compiler technology
- Hard to implement more powerful
- Examples ADIFOR, ADIC, ODYSSEE, TAMC
21Example AD Tool Architecture (ST)
- AD engine isolated front- and backends via XAIF
(XML AD Interface Format) - XML representation of the computational graph
- Unifies relevant Fortran and C constructs
- Implements abstractions, e.g. derivative object
- Shared plug-in differentiation modules
22XAIF Representation
23XAIF - Abstraction of the Program at AD-Level
Expression Example
- Only the core structure of the program is
reflected in XAIF - Control flow
- Variable information for active variables
- Basic blocks
- Expression DAGs
var_1
const
var_2
var_3
24Estimates of Incremental Computational Costs
25Hessian Module
- The Hessian module can compute H, HV, VTHV,
WTHV, as well as arbitrary elements of the
Hessian (e.g., diagonal, n predetermined
entries). - Tradeoffs in code generation between source
expansion and speed. Hessian/Function Ratio
26Techniques for Improving Performance of AD Code
- Exploit sparsity (SparsLinC and/or coloring)
- Exploit parallelism
- data stripmine derivative computation
- task multithread independent loops
- time break computation into phases pipeline
derivative computations - Exploit interface contractions
- For computations of the form
- Compute dg/dx, df/dg, multiply to form df/dx
- Exploit mathematics (e.g., differentiating
through linear/nonlinear equation solvers)
27ANL Tools for AD
- ADIFOR was developed in collaboration with Rice
University - full support for Fortran 77
- support for parallelism via MPI and PVM
- support for sparse Jacobians
- ADIC is the first only compiler-based AD tool
for ANSI C - support for the complete ANSI standard
- will soon support a large subset of C
- www.mcs.anl.gov/adic, www.mcs.anl.gov/adicserver
- XAIF specification and differentiation modules
(OpenAD project) - http//www-unix.mcs.anl.gov/utke/OpenAD
28AD in Numerical Toolkits
- NEOS Network-Enabled Optimization Server
- http//neos.mcs.anl.gov
- Efficient computation of gradients for large
problems, where the objective function has the
form - PETSc (Portable Extensible Toolkit for Scientific
Computation) solvers (work in progress) - User only needs to provide the sequential
subdomain update function in F77 or ANSI-C. - Differentiated version of toolkit enables
optimization/sensitivity analysis of models based
on PETSc - www.mcs.anl.gov/petsc
29Optimization Solution (PETSc TAO)
Main Routine
Nonlinear Solvers (SNES)
Gradient Evaluation
30Using AD with the Toolkit for Advanced
Optimization (TAO)
Global-to-local scatter of ghost values
Local Function computation
Local Min.Function computation
Parallel function assembly
Script file
Global-to-local scatter of ghost values
ADIFOR or ADIC
Coded manually can be automated
Seed matrix initialization
Local Hessian computation
Local Hessian computation
Parallel Hessian assembly
31Outline
- Automatic differentiation
- Components for scientific computing
- Introduction
- Example applications
- Performance evaluation and modeling
- Summary
CCA
Common Component Architecture
32Software development approaches
Architectures
Components
Object-oriented libraries collections of classes
Libraries collections of subroutines
Unstructured code (everything in main)
33Components
- Working definition a component is a piece of
software that can be composed with other
components within a framework composition can be
either static (at link time) or dynamic (at run
time) - plug-and-play model for building applications
- For more info C. Szyperski, Component Software
Beyond Object-Oriented Programming, ACM Press,
New York, 1998 - Components enable
- Software and tool interoperability
- Automation of performance instrumentation/monitori
ng - Application adaptivity (automated or user-guided)
- Pictorial intro
34Object-oriented vs component-oriented development
- Component-oriented development can be viewed as
augmenting OOD with certain policies, e.g.,
require that certain abstract interfaces be
implemented - Components, once compiled, require a special
execution environment - OO techniques are useful for building individual
components by relatively small teams component
technologies facilitate sharing of code developed
by different groups by addressing issues in - Language interoperability
- Via interface definition language (IDL)
- Well-defined abstract interfaces
- Enable plug-and-play
- Dynamic composability
- Components can discover information about their
environment (e.g., interface discovery) from
framework and connected components - Can convert from an object orientation to a
component orientation - Automatic tools can help with conversion (ongoing
work by C. Rasmussen and M. Sottile, LANL)
35Motivating scientific applications
Physics
Adaptive Solution
Optimization
Meshes
Derivative Computation
Discretization
Molecular structures
Astrophysics
Data Redistribution
Parallel I/O
Aerodynamics
Fusion
36Motivation For Application Developers and Users
- You have difficulty managing multiple third-party
libraries in your code - You (want to) use more than two languages in your
application - Your code is long-lived and different pieces
evolve at different rates - You want to be able to swap competing
implementations of the same idea and test without
modifying any of your code - You want to compose your application with some
other(s) that werent originally designed to be
combined
37The model for scientific component programming
CCA
38CCA Delivers Performance
- Local
- No CCA overhead within components
- Small overhead between components
- Small overhead for language interoperability
- Be aware of costs design with them in mind
- Small costs, easily amortized
- Parallel
- No CCA overhead on parallel computing
- Use your favorite parallel programming model
- Supports SPMD and MPMD approaches
- Distributed (remote)
- No CCA overhead performance depends on
networks, protocols - CCA frameworks support OGSA/Grid Services/Web
Services and other approaches
39Overhead from Component Invocation
- Invoke a component with different arguments
- Array
- Complex
- Double Complex
- Compare with f77 method invocation
- Environment
- 500 MHz Pentium III
- Linux 2.4.18
- GCC 2.95.4-15
- Components took 3X longer
- Ensure granularity is appropriate!
- Paper by Bernholdt, Elwasif, Kohl and Epperly
Function arg type f77 Component
Array 80 ns 224ns
Complex 75ns 209ns
Double complex 86ns 241ns
40Language interoperability what is so hard?
Native cfortran.h SWIG JNI Siloon Chasm Plat
form Dependent
f77
f90
C
C
Python
Java
41SIDL/Babel makes all supported languages peers
f77
This is not a Lowest Common Denominator Solution!
C
f90
C
Python
Java
42CCA Concepts Components and Ports
- Components provide or use one or more ports
- Components include some code which interacts with
a CCA framework - Frameworks provide services, such as component
instantiation and port connection
FunctionPort
FunctionPort
OptimizerPort
GradientPort
Objective Function
HessianPort
GradientPort
Optimization Algorithm
Function Gradient
HessianPort
- Implementation details
- CCA components
- Inherit from gov.cca.Component
- Implement setServices method to register ports
this component will provide and use - Implement the ports they provide
- Use ports on other components
- Call getPort/releasePort methods of framework
Services object - Ports (interfaces) extend the gov.cca.Port
interface
Function Hessian
43ExampleUnconstrained Minimization Problem
- Given a rectangular 2-dimensional domain and
boundary values along the edges of the domain - Find the surface with minimal area that satisfies
the boundary conditions, i.e., compute - min f(x), where f R ? R
- Solve using optimization
components based on
TAO (ANL)
44Unconstrained Minimization Using a Structured Mesh
Reused TAO
Solver Driver/Physics
45Computational Chemistry Molecular Optimization
- Investigators Yuri Alexeev (PNNL), Steve Benson
(ANL), Curtis Janssen (SNL), Joe Kenny (SNL),
Manoj Krishnan (PNNL), Lois McInnes (ANL), Jarek
Nieplocha (PNNL), Jason Sarich (ANL), Theresa
Windus (PNNL) - Goals Demonstrate interoperability among
software packages, develop experience with large
existing code bases, seed interest in chemistry
domain
- Problem Domain Optimization of molecular
structures using quantum chemical methods
46Molecular Optimization Overview
- Decouple geometry optimization from electronic
structure - Demonstrate interoperability of electronic
structure components - Build towards more challenging optimization
problems, e.g., protein/ligand binding studies
Components in gray can be swapped in to create
new applications with different capabilities.
47Wiring Diagram for Molecular Optimization
- Electronic structures components
- MPQC (SNL)
- http//aros.ca.sandia.gov/cljanss/mpqc
- NWChem (PNNL)
- http//www.emsl.pnl.gov/pub/docs/nwchem
- Optimization components TAO (ANL)
http//www.mcs.anl.gov/tao - Linear algebra components
- Global Arrays (PNNL) http//www.emsl.pnl.gov2080/
docs/global/ga.html - PETSc (ANL)
- http//www.mcs.anl.gov/petsc
48Outline
- Automatic differentiation
- Components for scientific computing
- Performance evaluation and modeling
- Performance evaluation challenges
- Component-based approach
- Motivating example adaptive linear system
solution - A component infrastructure for performance
monitoring and adaptation of applications - Summary
49Why Performance Model?
- Performance models enable understanding of the
factors that affect performance - Inform the tuning process (of application and
machine) - Identify bottlenecks
- Identify underperforming components
- Guide applications to the best machine
- Enable applications-driven architecture design
- Extrapolate the performance of future systems
50Challenges in performance evaluation
- Many tools for performance data gathering and
analysis - PAPI, TAU, SvPablo, Kojak,
- Various interfaces, levels of automation, and
approaches to information presentation - Users point of view
- What do the different tools do? Which is most
appropriate for a given application? - (How) can multiple tools be used in concert?
- I have tons of performance data, now what?
- What automatic tuning tools are available, what
exactly do they do? - How hard is it to install/learn/use tool X?
- Is instrumented code portable? Whats the
overhead of instrumentation? How does code
evolution affect the performance analysis process?
51Incomplete list of tools
- Source instrumentation TAU/PDT, KOJAK
(MPI/OpenMP), SvPablo, Performance Assertions, - Binary instrumentation HPCToolkit, Paradyn,
DyninstAPI, - Performance monitoring MetaSim Tracer (memory),
PAPI, HPCToolkit, Sigma (memory), DPOMP
(OpenMP), mpiP, gprof, psrun, - Modeling/analysis/prediction MetaSim Convolver
(memory), DIMEMAS(network), SvPablo
(scalability), Paradyn, Sigma, - Source/binary optimization Automated Empirical
Optimization of Software (ATLAS), OSKI, ROSE - Runtime adaptation ActiveHarmony, SALSA
52Incomplete list of tools
- Source instrumentation TAU/PDT, KOJAK
(MPI/OpenMP), SvPablo, Performance Assertions, - Binary instrumentation HPCToolkit, Paradyn,
DyninstAPI, - Performance monitoring MetaSim Tracer (memory),
PAPI, HPCToolkit, Sigma (memory), DPOMP
(OpenMP), mpiP, gprof, psrun, - Modeling/analysis/prediction MetaSim Convolver
(memory), DIMEMAS(network), SvPablo
(scalability), Paradyn, Sigma, - Source/binary optimization Automated Empirical
Optimization of Software (ATLAS), OSKI, ROSE - Runtime adaptation ActiveHarmony, SALSA
53Incomplete list of tools
- Source instrumentation TAU/PDT, KOJAK
(MPI/OpenMP), SvPablo, Performance Assertions, - Binary instrumentation HPCToolkit, Paradyn,
DyninstAPI, - Performance monitoring MetaSim Tracer (memory),
PAPI, HPCToolkit, Sigma (memory), DPOMP
(OpenMP), mpiP, gprof, psrun, - Modeling/analysis/prediction MetaSim Convolver
(memory), DIMEMAS(network), SvPablo
(scalability), Paradyn, Sigma, - Source/binary optimization Automated Empirical
Optimization of Software (ATLAS), OSKI, ROSE - Runtime adaptation ActiveHarmony, SALSA
54Challenges (where is the complexity?)
- More effective use ? integration
- Tool developers perspective
- Overhead of initially implementing one-to-one
interoperabilty - Ongoing management of dependencies on other tools
- Individual Scientist Perspective
- Learning curve for performance tools ? less time
to focus on own research (modeling, physics,
mathematics, optimization) - Potentially significant time investment needed to
find out whether/how using someone elses tool
would improve performance ? tend to do own
hand-coded optimizations (time-consuming,
non-reusable) - Lack of tools that automate (at least partially)
algorithm discovery, assembly, configuration, and
enable runtime adaptivity
55What can be done
- How to manage complexity? Provide
- Performance tools that are truly interoperable
- Uniform easy access to tools
- Component implementations of software, esp.
supporting numerical codes, such as linear
algebra algorithms - New algorithms (e.g., interactive/dynamic
techniques, algorithm composition) - Implementation approach components, both for
tools and the application software
56Performance Evaluation Research Center
(http//perc.nersc.gov)
57What is being done
- No integrated environment for performance
monitoring, analysis, and optimization (yet) - Most past efforts
- One-to-one tool interoperability
- More recently
- OSPAT (initial meeting at SC04), focus on common
data representation and interfaces - Tool-independent performance databases PerfDMF
- Eclipse parallel tools project (LANL)
58OSPAT
- The following areas were recommended for OSPAT to
investigate - A common instrumentation API for source level,
compiler level, library level, binary
instrumentation - A common probe interface for routine entry and
exit events - A common profile database schema
- An API to walk the callstack and examine the heap
memory - A common API for thread creation and fork
interface - Visualization components for drawing histograms
and hierarchical displays typically used by
performance tools
59Example component infrastructure for multimethod
linear solvers
- Goal provide a framework for
- Performance monitoring of numerical components
- Dynamic adaptativity, based on
- Off-line analyses of past performance information
- Online analysis of current execution performance
information - Motivating application examples
- Driven cavity flow Coffey et al, 2003,
nonlinear PDE solution - FUN3D incompressible and compressible Euler
equations - Prior work in multimethod linear solvers
- McInnes et al, 03, Bhowmick et al,03 and 05,
Norris at al. 05.
60Adaptive Linear System Solution
- Motivation
- Approximately 80 of total solution time devoted
to linear system solution - Multi-phase nonlinear solution method, requiring
the solution of linear systems with varying
levels of ill-conditioning Kelley and Keyes,
1998 - New approach aiming to reduce overall time to
solution - Combine more robust (but more costly) methods
when needed in some phases with faster (but less
powerful) methods in other phases - Dynamically select a new preconditioner in each
phase based on CFL number
61Example driven cavity flow
- Linear solver GMRES(30), vary only fill level of
ILU preconditioner - Adaptive heuristic based on
- Previous linear solution convergence rate,
nonlinear solution convergence rate, rate of
increase of linear solution iterations - 96x96 mesh, Grashof 105, lid velocity 100
- Intel P4 Xeon, dual 2.2 GHz, 4GB RAM
62Bringing it all together
- Integration of ongoing efforts in
- Performance tools common interfaces and data
represenation (leverage OSPAT, PerfDMF, TAU
performance interfaces, and similar efforts) - Numerical components emerging common interfaces
(e.g., TOPS solver interfaces) increase choice of
solution method ? automated composition and
adaptation strategies - Code generation, e.g., AD
- Long term
- Is a more organized (but not too restrictive)
environment for scientific software lifecycle
development possible/desirable?
63Multimethod linear solver components
Adaptive Heuristic
Linear Solver B
Linear Solver C
64AD as Component Factory
- Both NEOS and PETSc rely on a well-defined
function interface in order to provide
derivatives via AD - Extend this idea to components
Function
AD Tool
Jacobian
65Summary
- Automation at all levels of the application
development process can simplify and speed up
application development and result in better
software quality and performance - AD addresses the wide-spread need for accurate
and efficient derivative computations - CCA defines a high-performance component model,
enabling large-scale software development - A growing array of performance tools and
methodologies aid in understanding and
fine-tuning application performance - Current and future work bringing these
technologies together in a coherent way, making
large-scale scientific application development as
easy as possible
66Acknowledgments
- Paul Hovland, Jean Utke, Lois Curfman McInnes
(ANL) - Sanjukta Bhowmick (ANL/Columbia)
- Ivana Veljkovic, Padma Raghavan (Penn State)
- Sameer Shende, Al Malony (U. Oregon)
- CCA and PERC members
- Funding DOE and NSF
67For More Information
- Automatic differentiation
- Andreas Griewank. Evaluating Derivatives
Principles and Techniques of Alogrithmic
Differentiation, SIAM, 2000. - www.autodiff.org publications, tools, etc.
- www.mcs.anl.gov/adicserver ADIC server
- neos.mcs.anl.gov NEOS server
- Common component architecture
- www.cca-forum.org
- Performance tools
- perc.nersc.gov
- Student opportunities at MCS/ANL
- www-fp.mcs.anl.gov/division/information/educationa
l_programs/studentopps.html - Boyana Norris
- Email norris_at_mcs.anl.gov, Web
www.mcs.anl.gov/norris