Automatic Differentiation: Introduction - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Automatic Differentiation: Introduction

Description:

... is a technology for transforming a subprogram that computes some function into a subprogram that computes the derivatives ... vector products Reverse ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 17

Provided by: PaulD211

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Differentiation: Introduction

1
Automatic Differentiation Introduction

Automatic differentiation (AD) is a technology
for transforming a subprogram that computes some
function into a subprogram that computes the
derivatives of that function
Derivatives used in optimization, nonlinear
solvers, sensitivity analysis, uncertainty
quantification
Forward mode of AD is efficient for problems with
few independent variables or Jacobian-vector
products
Reverse mode of AD is efficient for problems with
few dependent variables or JTv products
Efficiency of generated code depends on
sophistication of underlying compiler analysis
and combinatorial algorithms

2
AD Current Capabilities

Fortran 77 ADIFOR 2.0/3.0
Robust, mature tool with excellent language
coverage
Excellent compiler analysis
Efficient forward mode (small number of
independents)
Adequate reverse mode (small number of
dependents)
C/C ADIC 2.0
Semi-mature tool with full C language coverage
Sophisticated differentiation algorithms
Efficient forward mode
Fortran 90 OpenAD/F
New tool with partial language coverage
Sophisticated differentiation algorithms
Accurate and novel compiler analysis
Innovative templating mechanism
Efficient forward and reverse modes

3
AD Application Highlight
Sensitivity of flow through Drake passage to
bottom topography, using MIT shallow water model
Runtime (ms) Ratio Memory
Simulation alone 220 1.0
Basic adjoint 14337 61.6 6.87M
Improved checkpointing 14120 60.6 21.44M
Add compiler analysis 2151 9.4 3.17M
Finite differences 23 days 14,400
4
AD Future Capabilities

C/C ADIC 2.x
Enhanced support for C (basic templating,
operator overloading)
Fortran 90 OpenAD/F
Improved language coverage (user-defined types,
pointers, etc.)
Both tools
New differentiation algorithms
New checkpointing mechanisms
Advanced compiler analysis
Efficient forward and reverse modes
Integration with CSCAPES coloring algorithms
Ease of use through integration with PETSc and
Zoltan toolkits

5
Load Balancing Introduction

Goals
Provide software and algorithms for load
balancing (partitioning) that can easily be used
by parallel applications.
Load balancing distribute work evenly among
processors while minimizing communication cost.
Reduces parallel run time.
Static load balancing (often called
partitioning)
Application computation and communication
patterns do not change
Partition and distribute data once
Dynamic load balancing
In dynamic or adaptive applications, computation
and communication change over time.
Load balancing should be invoked at certain
intervals.
Try to reduce data migration (application data to
move)

6
Load Balancing Current Capabilities

Zoltan Software toolkit for parallel data
management and load balancing
Available at http//www.cs.sandia.gov/Zoltan
Collection of many load-balancing methods
Geometric RCB, space filling curves
Graph and hypergraph partitioning
Data-structure neutral interface
Call-back functions
Single, common interface for many methods
Allows applications to plug and play
Portable, parallel code (MPI)
Used in many DOE and Sandia applications
Can run on thousands of processors

7
Load Balancing Applications

Large variety of applications, requirements, data
structures.

8
Load Balancing Future Capabilities

Scalable hypergraph partitioning
Hypergraphs accurately model communication volume
We aim to improve scalability to thousands of
processors
2d matrix partitioning
Reduce communication compared to standard 1d
distribution
Multiconstraint partitioning
Multi-physics simulation
Complex objectives partitioning
E.g., simultaneously balance computation and
memory
Parallel sparse matrix ordering (nested
dissection)

9
Reordering Transformations Introduction

Irregular memory access patterns make performance
sensitive to data and iteration orders
Run-time reordering transformations schedule data
accesses and iterations to maximize performance
Preliminary work on reordering heuristics shows
that hypergraph models outperform graph models
Full sparse tiling new inspector/executor
strategy that exploits inter-iteration locality

10
RT Current Capabilities

Open source package implementing several data and
iteration reordering heuristics
Data_N_Comp_Reorder
Data reordering heuristics
Breadth first search (graph-based)
Consecutive packing
Partitioning (graph-based)
Breadth first search (hypergraph-based)
Consecutive packing (hypergraph-based)
Partitioning (hypergraph-based)
Iteration reordering heuristics
Breadth first search (hypergraph-based)
Lexicographical sorting and various
approximations
Consecutive packing (hypergraph-based)
Partitioning (hypergraph-based)
Full sparse tiling implementation for model
problems

11
RT Application Highlight

Reordering for a mesh-quality improvement code
(FeasNewt T. Munson)
Hypergraph-BFS data reordering coupled with Cpack
iteration reordering offers best performance
Reordering leads to performance within 90 of
memory bandwidth limit for sparse matvec

12
RT Future Capabilities

New hypergraph-based runtime reordering
transformations
Comparison between hypergraph-based and bipartite
graph-based runtime reordering transformations
Hypergraph partitioners for load balancing
modified to work well for reordering
transformations
Hierarchical full sparse tiling for hierarchical
parallel systems

13
Graph Coloring and Matching Introduction

Graph coloring deals with partitioning a set of
binary-related objects into few groups of
independent objects
Sparsity exploitation in computation of Jacobians
and Hessians leads to a variety of graph coloring
problems. Sources of problem variations
Unsymmetric vs symmetric matrix
Direct vs substitution method
Uni- vs bi-directional partitioning

1d partition 2d partition
Jacobian Distance-2 coloring Star bicoloring Direct
Hessian Star coloring NA Direct
Jacobian NA Acyclic bicoloring Subst
Hessian Acyclic coloring NA Subst

Matching deals with finding a large set of
independent edges in a graph
Variant matching problems occur in
load-balancing, process scheduling, linear
solvers, preconditioners, etc.
Orthogonal sources of variation in matching
problems
Bipartite vs general graphs
Cardinality vs weighted problems

14
GCM Current Capabilities

Coloring
Serial
Developed novel (greedy) algorithms for
distance-1, distance-2, star and acyclic coloring
problems. A package implementing these algorithms
and corresponding variant ordering routines
available.
Parallel
Developed a scheme for parallelizing greedy
coloring algorithms on distributed-memory
computers. MPI implementations of distance-1 and
distance-2 coloring made available via Zoltan.
Matching
Algorithms that compute optimal solutions for
matching problems are polynomial in time, but
slow and difficult to parallelize.
High quality approximate solutions can be
computed in (near) linear time. Approximation
techniques make parallelization easier.
Developed fast approximation algorithms for
several matching problems.
Efficient implementations of exact matching
algorithms available.

15
GCM Application Highlights

Coloring
Automatic differentiation (sparse Jacobians and
Hessians)
Parallel computation (discovery of concurrency,
data migration)
Frequency allocation
Register allocation in compilers, etc
Matching
Numerical preprocessing in sparse linear systems
permute a matrix such that its diagonal or block
diagonal are heavy.
Block triangular decomposition in sparse linear
systems
decompose a system of equations into smaller sets
of systems.
Graph partitioning
guide the coarsening phase of multilevel graph
partitioning methods.

16
GCM Future Capabilities

Develop and implement star and acyclic bicoloring
algorithms for Jacobian computation
Develop parallel algorithms that scale to
thousands of processors for the various coloring
problems (distance-1, distance-2, star, acyclic)
Integrate coloring software with automatic
differentiation tools
Develop petascale parallel matching algorithms
based on approximation techniques